Package overview tour
Requirement on the shell: * Need cat command. If you use Windows command prompts, replace cat by type at each IPython magic line starting with !.
Requirement on python packages: * pandas: pip install pandas
This tutorial describes what our package can do and how to use it. This does not cover all functions or use cases but shows the most interesting features.
Let’s load the core I/O interface class: eXtended-I/O abbreviated as X-I/O or XIO
[1]:
from brane import ExtendedIO as xio
Read / Write operations via eXtended-I/O
text
First of all, let’s save the following text as a normal text file with XIO:
[2]:
text = """
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002
""".strip()
It’s very simple. Specify the text and the path to write.
[3]:
xio.write(text, "text.txt")
And you can see that this is written as just an ordinary text.
[4]:
!cat ./text.txt # this line works for Linux
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002
The reading the saved file is also simple:
[5]:
text_reload = xio.read("text.txt")
assert type(text_reload) == str
print(text_reload)
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002
The reloaded one is of string type and same as the orginal one.
[6]:
text == text_reload
[6]:
True
Next, save this text as a csv file.
[7]:
xio.write(text, "actor.csv")
Notice the extension is just the symbol of the format but meaningless to the filesystem. In this case, the content of this file is still text itself.
[8]:
!cat ./actor.csv # this line works for Linux
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002
csv
Now try reading the csv file. It’s just a text with a csv extension in the filename.
[9]:
df = xio.read("actor.csv")
print(type(df))
df
<class 'pandas.core.frame.DataFrame'>
[9]:
| name | role | birthyear | |
|---|---|---|---|
| 0 | Alice | sender | 1978.0 |
| 1 | Bob | receiver | 1978.0 |
| 2 | Carol | NaN | 1984.0 |
| 3 | Eve | eavesdropper | 1988.0 |
| 4 | Mallory | attacker | 2003.0 |
| 5 | Walter | warden | NaN |
| 6 | Ivan | issuer | 2002.0 |
In this turn, we find that the loaded object is not a text but pandas DataFrame. If you would like to use the builtin’s csv package in some reason, you can do it by passing the module_name as ‘csv’.
[10]:
df = xio.read("actor.csv", module_name="csv")
print(type(df))
df
<class 'list'>
[10]:
[['name', 'role', 'birthyear'],
['Alice', 'sender', '1978'],
['Bob', 'receiver', '1978'],
['Carol', '', '1984'],
['Eve', 'eavesdropper', '1988'],
['Mallory', 'attacker', '2003'],
['Walter', 'warden', ''],
['Ivan', 'issuer', '2002']]
Again, you can specify the path argument and the pandas module for module_name to ensure. It’s better to add type annotation if you know the type of the loaded object.
[11]:
import pandas as pd
df: pd.DataFrame = xio.read(path="actor.csv", module_name="pandas")
print(type(df))
df
<class 'pandas.core.frame.DataFrame'>
[11]:
| name | role | birthyear | |
|---|---|---|---|
| 0 | Alice | sender | 1978.0 |
| 1 | Bob | receiver | 1978.0 |
| 2 | Carol | NaN | 1984.0 |
| 3 | Eve | eavesdropper | 1988.0 |
| 4 | Mallory | attacker | 2003.0 |
| 5 | Walter | warden | NaN |
| 6 | Ivan | issuer | 2002.0 |
Let’s save the pandas dataframe. As you may know, this will save the index too:
[12]:
xio.write(df, "actor_w_index.csv")
!cat ./actor_w_index.csv # this line works for Linux
name,role,birthyear
Alice,sender,1978.0
Bob,receiver,1978.0
Carol,,1984.0
Eve,eavesdropper,1988.0
Mallory,attacker,2003.0
Walter,warden,
Ivan,issuer,2002.0
Wow, no index appears. This is because some most common parameters are set in advance. For pandas, index=None is such a parameter.
Of course, you can specify the index too.
[13]:
xio.write(obj=df, path="actor_w_name_index.csv", index=True)
!cat ./actor_w_name_index.csv # this line works for Linux
name,role,birthyear
Alice,sender,1978.0
Bob,receiver,1978.0
Carol,,1984.0
Eve,eavesdropper,1988.0
Mallory,attacker,2003.0
Walter,warden,
Ivan,issuer,2002.0
Here, for clarity, we put the keyword arguments like obj and path,
Customization
In yhis section, we
treat the Python dictionary but define its wrapper class
consider new own format or extension .hello
Define own module
For that purpose, let’s define our own format ‘hello’ with its class’ implementation as follows:
header (1st line): a symbol which plays role as a separator in the following lines
body (any other sequent lines): each line consisting of key and value with the separator specified in the header.
[14]:
from __future__ import annotations
from typing import Union
class HelloClass():
# hello object which wraps a Python dictionary
def __init__(self, mapper: dict[str, Union[str, int]]):
self.mapper = mapper
def __repr__(self) -> str:
return self.mapper.__repr__()
class HelloIO():
# hello format IO module
@staticmethod
def load(path: str) -> HelloClass:
with open(path, "r") as f:
file: str = f.read()
sep, *lines = file.split("\n")
data: dict = {}
for l in lines:
k, v = l.split(sep)
data[k] = v
return HelloClass(mapper=data)
@staticmethod
def dump(mapper: HelloClass, path: str, sep: str = ": "):
with open(path, "w") as f:
f.write(f"{sep}")
for key, value in obj.mapper.items():
f.write(f"\n{key}{sep}{value}")
Now, test it:
[15]:
obj = HelloClass(mapper={
"Jan": 1,
"Feb": 2,
"Mar": 3,
})
Save it.
[16]:
HelloIO.dump(mapper=obj, path="test.hello")
[17]:
!cat ./test.hello
:
Jan: 1
Feb: 2
Mar: 3
Load it.
[18]:
HelloIO.load(path="test.hello")
[18]:
{'Jan': '1', 'Feb': '2', 'Mar': '3'}
Good.
Define new Module subclass
Next task is to register our hello format to X-I/O. For that purpose, let’s define a new class inheritating the Module class. This class defines the common interface i.e. it has read & write method.
[19]:
from brane.core import Module
class HelloModule(Module):
name = "hello" # ID of this Module class
@classmethod
def read(cls, path: str, *args, **kwargs):
return HelloIO.load(path=path)
@classmethod
def write(cls, obj, path, *args, **kwargs):
return HelloIO.dump(mapper=obj, path=path)
You must define three attributes at this class:
name (propetry): This is the ID of the new module.
read (classmethod): This defines the reading/loading process with the keyword arguments path at least.
write (classmethod): This defines the writing/saving process with the keyword arguments obj & path at least.
Test it again.
[20]:
HelloModule.read(obj=obj, path="test.hello")
[20]:
{'Jan': '1', 'Feb': '2', 'Mar': '3'}
[21]:
HelloModule.write(obj=obj, path="test2.hello")
[22]:
!cat ./test2.hello
:
Jan: 1
Feb: 2
Mar: 3
No problem at all.
Define new Format subclass
At this stage, there is no connection between HelloModule and the hello extension shown in paths. Then, we must implement another class called Format which connect the above reading/writing module and the extension.
[23]:
from brane.core import Format
class HelloFormat(Format):
name = "hello" # ID of this Format class
module = HelloModule
default_extension = "hello" # the extension in the path
name (propetry): This is the ID of the new format.
module (property): Assign the correspnding
Modulesubclass.default_extension (property): The extension name.
[24]:
xio.read("test.hello")
[24]:
{'Jan': '1', 'Feb': '2', 'Mar': '3'}
OK, great work ! Now, the brane I/O choose the correct module based on the extension.
Define new Object subclass
Finally, let’s save our Hello object in our Hello format i.e. with the .hello extension.
[25]:
from brane.core import Object
class HelloObject(Object):
format = HelloFormat
module = HelloModule
object_type = HelloClass
In our case, it’s still simple:
module (property): The corresponding
Modulesubclass.format (property): The corresponding
Formatsubclass.object_type (property): The type of the target objects, here,
HelloClass.
[26]:
xio.write(obj=obj, path="auto.hello")
[27]:
!cat ./auto.hello # this line works for Linux
:
Jan: 1
Feb: 2
Mar: 3
Now, we’ve learned the basic definition & registration of our own I/O to the eXtend-I/O.
Hook system
Check the pre-existing hooks
Let’s save the following text to a new directory named quotes:
[28]:
text = """
Nobody ever figures out what life is all about, and it doesn't matter.
Explore the world.
Nearly everything is really interesting if you go into it deeply enough.
""".strip()
First, try it without making the directory.
[29]:
try:
with open("quotes/most_like.txt", "w") as f:
f.write(text)
except FileNotFoundError as e:
print(e)
[Errno 2] No such file or directory: 'quotes/most_like.txt'
Of course, it fails as we expected. However,
[30]:
from brane.core import Object
[31]:
xio.write(obj=text, path="quotes/most_like.txt")
[32]:
!cat quotes/most_like.txt # this line works for Linux
Nobody ever figures out what life is all about, and it doesn't matter.
Explore the world.
Nearly everything is really interesting if you go into it deeply enough.
where no previous error happens if we save it through X-I/O ! The reason is that this called a hook function which creates the missing directory before saving. This can be checked by accessing the attribute pre_write:
[33]:
xio.pre_write
[33]:
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>
where you see two strings at least except a number on top: * a hex sequence is a hook ID automatically assigned on the left * <function create_parent_directory at ...> points to the python function named create_parent_directory which is exactly called at the above execution.
Add new hook
Let’s consider new hooks: * remove the line break (\n) on loading any text * put the copyright symbol on the last on saving any text
First of all, check the default behaviour:
[34]:
xio.read("quotes/most_like.txt")
[34]:
"Nobody ever figures out what life is all about, and it doesn't matter.\nExplore the world.\nNearly everything is really interesting if you go into it deeply enough."
OK. Now, define a function replacing any line break by a space.
[35]:
def remove_linebreaks(context):
obj = context["object"] # Currently, you must put this line for all hooks. Then, `obj` variable is the loaded object on loading.
if isinstance(obj, str):
return obj.replace("\n", " ")
And add it.
[36]:
xio.register_post_read_hook(hook=remove_linebreaks)
Now, read it again
[37]:
loaded_text: str = xio.read("quotes/most_like.txt")
loaded_text
[37]:
"Nobody ever figures out what life is all about, and it doesn't matter. Explore the world. Nearly everything is really interesting if you go into it deeply enough."
You see no line breaks now.
Next, let’a add copyright on the last line for every text.
[38]:
import datetime
author_name: str = "Richard P.Feynman"
copyright: str = f"© {datetime.datetime.today().year} {author_name}"
copyright
[38]:
'© 2022 Richard P.Feynman'
In the same way as above, define the function and register it:
[39]:
def append_copyright(context):
obj = context["object"]
if isinstance(obj, str):
return obj + "\n" + copyright
[40]:
xio.register_pre_write_hook(hook=append_copyright)
You see the two hooks are not registered at the pre-write hooks.
[41]:
xio.pre_write
[41]:
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>
2. 32e966b6991c51b9: <function append_copyright at 0x7f3437a6aaf0>
Write the text again,
[42]:
xio.write(obj=loaded_text, path="quotes/most_like_with_copyright.txt")
then you check that the new text includes the copyright at the bottom !
[43]:
!cat ./quotes/most_like_with_copyright.txt
Nobody ever figures out what life is all about, and it doesn't matter. Explore the world. Nearly everything is really interesting if you go into it deeply enough.
© 2022 Richard P.Feynman
Finally, let’s comment how to check all the registered hooks and to remove unnecessary hooks. The check is easy: just call show_events methods.
[44]:
xio.show_events()
Event: post_read
1. 244d96431cd5ea83: <function remove_linebreaks at 0x7f343833c670>
Event: post_readall
No hooks are registered
Event: post_write
No hooks are registered
Event: post_writeall
No hooks are registered
Event: pre_read
1. 65e9f9d57134d5d7: <function check_path_existence at 0x7f346c3f3820>
Event: pre_readall
No hooks are registered
Event: pre_write
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>
2. 32e966b6991c51b9: <function append_copyright at 0x7f3437a6aaf0>
Event: pre_writeall
No hooks are registered
Now, consider removing the hook we add at the post-reading: remove_linebreaks. The hook name/id is given by
[45]:
# This is just temporal function to get the 1st hook id
# Of course, you can take it from the above hook list and copy it
def get_hook_names(event) -> list[str]:
return [ hook.hook_name for hook in event.hooks ]
hook_id = next(iter(get_hook_names(xio.post_read)))
hook_id
[45]:
'244d96431cd5ea83'
[46]:
xio.remove_hooks(hook_id)
Now, it’s gone as you see in the below.
[47]:
xio.show_events()
Event: post_read
No hooks are registered
Event: post_readall
No hooks are registered
Event: post_write
No hooks are registered
Event: post_writeall
No hooks are registered
Event: pre_read
1. 65e9f9d57134d5d7: <function check_path_existence at 0x7f346c3f3820>
Event: pre_readall
No hooks are registered
Event: pre_write
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>
2. 32e966b6991c51b9: <function append_copyright at 0x7f3437a6aaf0>
Event: pre_writeall
No hooks are registered
If you remove all, you can call the following method clear_all_hooks but it erases all the registered hooks. Note: Of course, if you restart the session, you will see the initally registered hooks when the X-I/O is loaded.
[48]:
xio.clear_all_hooks()
[49]:
xio.show_events()
Event: post_read
No hooks are registered
Event: post_readall
No hooks are registered
Event: post_write
No hooks are registered
Event: post_writeall
No hooks are registered
Event: pre_read
No hooks are registered
Event: pre_readall
No hooks are registered
Event: pre_write
No hooks are registered
Event: pre_writeall
No hooks are registered
Now, you did a great job and know our package well !