Package overview tour

Requirement on the shell: * Need cat command. If you use Windows command prompts, replace cat by type at each IPython magic line starting with !.

Requirement on python packages: * pandas: pip install pandas

This tutorial describes what our package can do and how to use it. This does not cover all functions or use cases but shows the most interesting features.

Let’s load the core I/O interface class: eXtended-I/O abbreviated as X-I/O or XIO

[1]:
from brane import ExtendedIO as xio

Read / Write operations via eXtended-I/O

text

First of all, let’s save the following text as a normal text file with XIO:

[2]:
text = """
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002
""".strip()

It’s very simple. Specify the text and the path to write.

[3]:
xio.write(text, "text.txt")

And you can see that this is written as just an ordinary text.

[4]:
!cat ./text.txt  # this line works for Linux
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002

The reading the saved file is also simple:

[5]:
text_reload = xio.read("text.txt")
assert type(text_reload) == str
print(text_reload)
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002

The reloaded one is of string type and same as the orginal one.

[6]:
text == text_reload
[6]:
True

Next, save this text as a csv file.

[7]:
xio.write(text, "actor.csv")

Notice the extension is just the symbol of the format but meaningless to the filesystem. In this case, the content of this file is still text itself.

[8]:
!cat ./actor.csv  # this line works for Linux
name,role,birthyear
Alice,sender,1978
Bob,receiver,1978
Carol,,1984
Eve,eavesdropper,1988
Mallory,attacker,2003
Walter,warden,
Ivan,issuer,2002

csv

Now try reading the csv file. It’s just a text with a csv extension in the filename.

[9]:
df = xio.read("actor.csv")
print(type(df))
df
<class 'pandas.core.frame.DataFrame'>
[9]:
name role birthyear
0 Alice sender 1978.0
1 Bob receiver 1978.0
2 Carol NaN 1984.0
3 Eve eavesdropper 1988.0
4 Mallory attacker 2003.0
5 Walter warden NaN
6 Ivan issuer 2002.0

In this turn, we find that the loaded object is not a text but pandas DataFrame. If you would like to use the builtin’s csv package in some reason, you can do it by passing the module_name as ‘csv’.

[10]:
df = xio.read("actor.csv", module_name="csv")
print(type(df))
df
<class 'list'>
[10]:
[['name', 'role', 'birthyear'],
 ['Alice', 'sender', '1978'],
 ['Bob', 'receiver', '1978'],
 ['Carol', '', '1984'],
 ['Eve', 'eavesdropper', '1988'],
 ['Mallory', 'attacker', '2003'],
 ['Walter', 'warden', ''],
 ['Ivan', 'issuer', '2002']]

Again, you can specify the path argument and the pandas module for module_name to ensure. It’s better to add type annotation if you know the type of the loaded object.

[11]:
import pandas as pd
df: pd.DataFrame = xio.read(path="actor.csv", module_name="pandas")
print(type(df))
df
<class 'pandas.core.frame.DataFrame'>
[11]:
name role birthyear
0 Alice sender 1978.0
1 Bob receiver 1978.0
2 Carol NaN 1984.0
3 Eve eavesdropper 1988.0
4 Mallory attacker 2003.0
5 Walter warden NaN
6 Ivan issuer 2002.0

Let’s save the pandas dataframe. As you may know, this will save the index too:

[12]:
xio.write(df, "actor_w_index.csv")
!cat ./actor_w_index.csv   # this line works for Linux
name,role,birthyear
Alice,sender,1978.0
Bob,receiver,1978.0
Carol,,1984.0
Eve,eavesdropper,1988.0
Mallory,attacker,2003.0
Walter,warden,
Ivan,issuer,2002.0

Wow, no index appears. This is because some most common parameters are set in advance. For pandas, index=None is such a parameter.

Of course, you can specify the index too.

[13]:
xio.write(obj=df, path="actor_w_name_index.csv", index=True)
!cat ./actor_w_name_index.csv  # this line works for Linux
name,role,birthyear
Alice,sender,1978.0
Bob,receiver,1978.0
Carol,,1984.0
Eve,eavesdropper,1988.0
Mallory,attacker,2003.0
Walter,warden,
Ivan,issuer,2002.0

Here, for clarity, we put the keyword arguments like obj and path,

Customization

In yhis section, we

  • treat the Python dictionary but define its wrapper class

  • consider new own format or extension .hello

Define own module

For that purpose, let’s define our own format ‘hello’ with its class’ implementation as follows:

  • header (1st line): a symbol which plays role as a separator in the following lines

  • body (any other sequent lines): each line consisting of key and value with the separator specified in the header.

[14]:
from __future__ import annotations
from typing import Union

class HelloClass():
    # hello object which wraps a Python dictionary
    def __init__(self, mapper: dict[str, Union[str, int]]):
        self.mapper = mapper

    def __repr__(self) -> str:
        return self.mapper.__repr__()

class HelloIO():
    # hello format IO module
    @staticmethod
    def load(path: str) -> HelloClass:
        with open(path, "r") as f:
            file: str = f.read()
        sep, *lines = file.split("\n")
        data: dict = {}
        for l in lines:
            k, v = l.split(sep)
            data[k] = v
        return HelloClass(mapper=data)

    @staticmethod
    def dump(mapper: HelloClass, path: str, sep: str = ": "):
        with open(path, "w") as f:
            f.write(f"{sep}")
            for key, value in obj.mapper.items():
                f.write(f"\n{key}{sep}{value}")

Now, test it:

[15]:
obj = HelloClass(mapper={
    "Jan": 1,
    "Feb": 2,
    "Mar": 3,
})

Save it.

[16]:
HelloIO.dump(mapper=obj, path="test.hello")
[17]:
!cat ./test.hello
:
Jan: 1
Feb: 2
Mar: 3

Load it.

[18]:
HelloIO.load(path="test.hello")
[18]:
{'Jan': '1', 'Feb': '2', 'Mar': '3'}

Good.

Define new Module subclass

Next task is to register our hello format to X-I/O. For that purpose, let’s define a new class inheritating the Module class. This class defines the common interface i.e. it has read & write method.

[19]:
from brane.core import Module

class HelloModule(Module):
    name = "hello"  # ID of this Module class

    @classmethod
    def read(cls, path: str, *args, **kwargs):
        return HelloIO.load(path=path)

    @classmethod
    def write(cls, obj, path, *args, **kwargs):
        return HelloIO.dump(mapper=obj, path=path)

You must define three attributes at this class:

  • name (propetry): This is the ID of the new module.

  • read (classmethod): This defines the reading/loading process with the keyword arguments path at least.

  • write (classmethod): This defines the writing/saving process with the keyword arguments obj & path at least.

Test it again.

[20]:
HelloModule.read(obj=obj, path="test.hello")
[20]:
{'Jan': '1', 'Feb': '2', 'Mar': '3'}
[21]:
HelloModule.write(obj=obj, path="test2.hello")
[22]:
!cat ./test2.hello
:
Jan: 1
Feb: 2
Mar: 3

No problem at all.

Define new Format subclass

At this stage, there is no connection between HelloModule and the hello extension shown in paths. Then, we must implement another class called Format which connect the above reading/writing module and the extension.

[23]:
from brane.core import Format

class HelloFormat(Format):
    name = "hello"  # ID of this Format class
    module = HelloModule
    default_extension = "hello"  # the extension in the path
  • name (propetry): This is the ID of the new format.

  • module (property): Assign the correspnding Module subclass.

  • default_extension (property): The extension name.

[24]:
xio.read("test.hello")
[24]:
{'Jan': '1', 'Feb': '2', 'Mar': '3'}

OK, great work ! Now, the brane I/O choose the correct module based on the extension.

Define new Object subclass

Finally, let’s save our Hello object in our Hello format i.e. with the .hello extension.

[25]:
from brane.core import Object

class HelloObject(Object):
    format = HelloFormat
    module = HelloModule
    object_type = HelloClass

In our case, it’s still simple:

  • module (property): The corresponding Module subclass.

  • format (property): The corresponding Format subclass.

  • object_type (property): The type of the target objects, here, HelloClass.

[26]:
xio.write(obj=obj, path="auto.hello")
[27]:
!cat ./auto.hello  # this line works for Linux
:
Jan: 1
Feb: 2
Mar: 3

Now, we’ve learned the basic definition & registration of our own I/O to the eXtend-I/O.

Hook system

Check the pre-existing hooks

Let’s save the following text to a new directory named quotes:

[28]:
text = """
Nobody ever figures out what life is all about, and it doesn't matter.
Explore the world.
Nearly everything is really interesting if you go into it deeply enough.
""".strip()

First, try it without making the directory.

[29]:
try:
    with open("quotes/most_like.txt", "w") as f:
        f.write(text)
except FileNotFoundError as e:
    print(e)
[Errno 2] No such file or directory: 'quotes/most_like.txt'

Of course, it fails as we expected. However,

[30]:
from brane.core import Object
[31]:
xio.write(obj=text, path="quotes/most_like.txt")
[32]:
!cat quotes/most_like.txt  # this line works for Linux
Nobody ever figures out what life is all about, and it doesn't matter.
Explore the world.
Nearly everything is really interesting if you go into it deeply enough.

where no previous error happens if we save it through X-I/O ! The reason is that this called a hook function which creates the missing directory before saving. This can be checked by accessing the attribute pre_write:

[33]:
xio.pre_write
[33]:
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>

where you see two strings at least except a number on top: * a hex sequence is a hook ID automatically assigned on the left * <function create_parent_directory at ...> points to the python function named create_parent_directory which is exactly called at the above execution.

Add new hook

Let’s consider new hooks: * remove the line break (\n) on loading any text * put the copyright symbol on the last on saving any text

First of all, check the default behaviour:

[34]:
xio.read("quotes/most_like.txt")
[34]:
"Nobody ever figures out what life is all about, and it doesn't matter.\nExplore the world.\nNearly everything is really interesting if you go into it deeply enough."

OK. Now, define a function replacing any line break by a space.

[35]:
def remove_linebreaks(context):
    obj = context["object"]  # Currently, you must put this line for all hooks. Then, `obj` variable is the loaded object on loading.
    if isinstance(obj, str):
        return obj.replace("\n", " ")

And add it.

[36]:
xio.register_post_read_hook(hook=remove_linebreaks)

Now, read it again

[37]:
loaded_text: str = xio.read("quotes/most_like.txt")
loaded_text
[37]:
"Nobody ever figures out what life is all about, and it doesn't matter. Explore the world. Nearly everything is really interesting if you go into it deeply enough."

You see no line breaks now.

Next, let’a add copyright on the last line for every text.

[38]:
import datetime

author_name: str = "Richard P.Feynman"
copyright: str = f{datetime.datetime.today().year} {author_name}"
copyright
[38]:
'© 2022 Richard P.Feynman'

In the same way as above, define the function and register it:

[39]:
def append_copyright(context):
    obj = context["object"]
    if isinstance(obj, str):
        return obj + "\n" + copyright
[40]:
xio.register_pre_write_hook(hook=append_copyright)

You see the two hooks are not registered at the pre-write hooks.

[41]:
xio.pre_write
[41]:
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>
2. 32e966b6991c51b9: <function append_copyright at 0x7f3437a6aaf0>

Write the text again,

[42]:
xio.write(obj=loaded_text, path="quotes/most_like_with_copyright.txt")

then you check that the new text includes the copyright at the bottom !

[43]:
!cat ./quotes/most_like_with_copyright.txt
Nobody ever figures out what life is all about, and it doesn't matter. Explore the world. Nearly everything is really interesting if you go into it deeply enough.
© 2022 Richard P.Feynman

Finally, let’s comment how to check all the registered hooks and to remove unnecessary hooks. The check is easy: just call show_events methods.

[44]:
xio.show_events()
Event: post_read
1. 244d96431cd5ea83: <function remove_linebreaks at 0x7f343833c670>
Event: post_readall
 No hooks are registered
Event: post_write
 No hooks are registered
Event: post_writeall
 No hooks are registered
Event: pre_read
1. 65e9f9d57134d5d7: <function check_path_existence at 0x7f346c3f3820>
Event: pre_readall
 No hooks are registered
Event: pre_write
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>
2. 32e966b6991c51b9: <function append_copyright at 0x7f3437a6aaf0>
Event: pre_writeall
 No hooks are registered

Now, consider removing the hook we add at the post-reading: remove_linebreaks. The hook name/id is given by

[45]:
# This is just temporal function to get the 1st hook id
# Of course, you can take it from the above hook list and copy it
def get_hook_names(event) -> list[str]:
    return [ hook.hook_name for hook in event.hooks ]

hook_id = next(iter(get_hook_names(xio.post_read)))
hook_id
[45]:
'244d96431cd5ea83'
[46]:
xio.remove_hooks(hook_id)

Now, it’s gone as you see in the below.

[47]:
xio.show_events()
Event: post_read
 No hooks are registered
Event: post_readall
 No hooks are registered
Event: post_write
 No hooks are registered
Event: post_writeall
 No hooks are registered
Event: pre_read
1. 65e9f9d57134d5d7: <function check_path_existence at 0x7f346c3f3820>
Event: pre_readall
 No hooks are registered
Event: pre_write
1. 1269bcdeaab3db08: <function create_parent_directory at 0x7f346c3f3670>
2. 32e966b6991c51b9: <function append_copyright at 0x7f3437a6aaf0>
Event: pre_writeall
 No hooks are registered

If you remove all, you can call the following method clear_all_hooks but it erases all the registered hooks. Note: Of course, if you restart the session, you will see the initally registered hooks when the X-I/O is loaded.

[48]:
xio.clear_all_hooks()
[49]:
xio.show_events()
Event: post_read
 No hooks are registered
Event: post_readall
 No hooks are registered
Event: post_write
 No hooks are registered
Event: post_writeall
 No hooks are registered
Event: pre_read
 No hooks are registered
Event: pre_readall
 No hooks are registered
Event: pre_write
 No hooks are registered
Event: pre_writeall
 No hooks are registered

Now, you did a great job and know our package well !