Python's "with"? keyword - in context

Python's "with" keyword - in context

In Python, everything is an object. To some this may be a tad cliche to repeat, but underneath this rather innocuous statement lies a paradigm that is indicative of every object that developers interact with in the Python programming language. This paradigm is the Python data model.

The Python data model[1] provides developers an interface with which to hook into some of the core functionality of the Python programming language. The model provides an explicit API through which all Python objects can be interacted with. Knowing this model, and how to leverage it so to create our own hooks for customized object abstractions, is one of the core tenants of advanced Python programming.

One of my favorite abstractions that demonstrates this data model paradigm is the context manager. Context managers are designed to enforce the execution of a set of protocols around the scope of a given block of code. Most commonly this is observed in the context of resource management, but in theory, the context manager can be applied to any block of code which requires an 'enter and exit' strategy to interact 'with'.

I'm choosing these words rather playfully, as the three elements which context managers require are the 'with' keyword, which invokes the context manager (CM for short), and then the __enter__() and __exit__() methods, which Python executes as the CM is entered and exited respectively. The result here is wrapping a very technical protocol into beautiful, idiomatic Python code.

class ContextManager:
    def __enter__(self):
        print('entering...')

    def __exit__(self, *args, **kwargs):
        print('exiting...')

with ContextManager():
    print('in context')

In []: python -u d:\programming\sandbox\contextmanager.py
Out[]:

entering...
in context
exiting...

As a quick point, CM methods emulate the functionality of a try/finally block, where the __exit__() is called regardless of the state of the block inside the context manager. The encompassed error will still ultimately be thrown, but the benefit here is that we get the exit protocol to execute no matter the success or failure of the encompassed code block.

class ContextManager:
    def __enter__(self):
        print('entering...')

    def __exit__(self, *args, **kwargs):
        print('exiting...')

with ContextManager():
    raise ValueError

In []: python -u d:\programming\sandbox\contextmanager.py
Out[]:

entering...
exiting...
Traceback (most recent call last):
  File "d:\programming\sandbox\contextmanager.py", line 9, in <module>
    raise ValueError
ValueError

As I mentioned before, the most common examples of CMs come in the form of resource managers. Many resource-management objects in Python are by design context managers, built with the intent to abstract away from developers the need to incessantly call open, close, read, write, save, etc., over I/O resources that can be exclusively tethered to your Python instance during runtime. For example, "open(path)" can be used as a CM, where in this case the CM handles the I/O setup and teardown protocols under the hood so that the developer doesn't have to worry about resource management themselves.

with open(r'D:\Programming\csv_reader\this.txt', 'w') as file:
    file.writelines(['this\n', 'that\n', 'this and that\n'])


with open(r'D:\Programming\csv_reader\this.txt', 'r') as file:
    print(file.readlines())

---vs---


file = open(r'D:\Programming\csv_reader\this.txt', 'w')
file.writelines(['this\n', 'that\n', 'this and that\n'])
file.close()
file = open(r'D:\Programming\csv_reader\this.txt', 'r')
print(file.readlines())
file.close()

Example: CSVdb

CSV files are ubiquitous in the realm of data science and analytics. They're easy to generate and simple to send and receive. Oftentimes data in a CSV file will be arranged as a table, where a line of headers describe the subsequent columns of data in a file.

The one major drawback to these types of CSV files however is that they're row-oriented. We can easily read a row of text as a string from a CSV file via file.readline(). However, as mentioned above, the data arrangement of CSV's is commonly oriented by column. Yet, there's no 'file.readcolumn()' method to be found! That's annoying. Now, this may be a non-issue if we can simply read the entire thing into memory using something like pandas.read_csv(), but this is a non-starter if our hardware doesn't have the necessary RAM to keep everything off-disk.

For Python, a great yet simple solution to this problem is to read our CSV file into a relational database. Though unavailable as a CSV, in a relational db we have the option to read out our data by column. Python ships natively with a relational database module called sqlite3, which, combined with Python's CSV module, provides us with the necessary functionality for reading into python our data by column.

import csv
import os.path
import sqlite3
import tempfile


import pandas as pd


file = open(r'D:\Programming\csv_reader\this.txt')
reader = csv.reader(file, delimiter=',')
headers = next(reader)
temp_dir = tempfile.TemporaryDirectory()
conn = sqlite3.connect(os.path.join(temp_dir.name, 'temp.db'))
sql = conn.cursor()
sql.execute(
    """
    CREATE TABLE IF NOT EXISTS csv (
    table_id INTEGER PRIMARY KEY,
    {}
    )
    """.format(
        ', '.join([h + " TEXT" for h in headers])
    )
)
conn.commit()
sql.executemany(
    "INSERT INTO csv({}) VALUES({})".format(
        ', '.join(headers), 
        ', '.join(['?' for _ in range(len(headers))])
    ), reader
)
conn.commit()
file.close()


target_header = headers[0]


df = pd.read_sql_query(
    "SELECT {} from csv".format(target_header),
     conn
)


series = df[target_header]
print(series)
sql.close()
conn.close()
temp_dir.cleanup()


As we can see here, without reading the entire file into memory, we can push the contents of our CSV file into a relational database, and subsequently use SQL to read out a column of data from the *.db file into a pandas series.

This is great start to our problem; but we can clean this up tremendously if we consider that, other than the target_header, df, and series objects, everything else codified is simply setup and teardown protocol. I.E. we can generalize this protocol with a context manager.

import csv
import os.path
import random
import sqlite3
import tempfile


import pandas as pd


# If CSV has a blank header (like for an index column) 
# we'll just give it a random one here since 
# headers in SQLite can't be blank
def string_generator(string_length: int, char=False) -> str:
    if not char:
        return ''.join(
            [chr(random.randint(65,90)) for _ in range(string_length)]
        )
    return char * string_length



class CSVdb:
    def __init__(self, input_file: "file"):
        self.file = input_file


    def __enter__(self) -> 'CSVdb':
        reader = csv.reader(self.file, delimiter=',')
        self.headers = [
            x if x != '' else string_generator(5) for x in next(reader)
        ]
        self.temp_dir = tempfile.TemporaryDirectory()
        self.conn = sqlite3.connect(
            os.path.join(self.temp_dir.name, 'temp.db')
        )
        self.sql = self.conn.cursor()
        self._execute_and_commit(self._create_table())
        self._executemany_and_commit(self._insert_values(), reader)
        self.file.close()
        return self


    def __exit__(self, *args, **kwargs):
        self.sql.close()
        self.conn.close()
        self.temp_dir.cleanup()


    @staticmethod
    def from_filepath(path: str) -> 'CSVdb':
        return CSVdb(open(path))


    def pull_column(self, header: str) -> pd.Series:
        if header not in self.headers:
            raise ValueError("header requested not in database")
        df = pd.read_sql_query(
            "SELECT {} from csv".format(header),
            self.conn
        )
        return df[header]


    def _create_table(self) -> str:
        return """
        CREATE TABLE IF NOT EXISTS csv (
            table_id INTEGER PRIMARY KEY,
            {}
        )
        """.format(
            ', '.join([h + " TEXT" for h in self.headers])
        )


    def _insert_values(self) -> str:
        return """
            INSERT INTO csv({}) VALUES({})
        """.format(
            ', '.join(self.headers), 
            ', '.join(['?' for _ in range(len(self.headers))])
        )


    def _execute_and_commit(self, *args, **kwargs) -> None:
        self.sql.execute(*args, **kwargs)
        self.conn.commit()


    def _executemany_and_commit(self, *args, **kwargs) -> None:
        self.sql.executemany(*args, **kwargs)
        self.conn.commit()





With this CSVdb class, we've completely abstracted away any interaction with the setup and teardown process of these file and directory objects. Instead of having to keep track of every I/O instance, we can simply call the context manager's pull_column() method so to extract the necessary column(s) of data within a 'with' block.

with CSVdb.from_filepath(r'D:\Database\csv\NOAA\california.csv') as c:
    series = c.pull_column(<TARGET_HEADER>)
    df = pd.DataFrame(
        {h: c.pull_column(h) for h in <ITERABLE_TARGET_HEADERS>}
    )

Summary

The Python data model provides us developers with a powerful way to hook into some of the core functionality of the Python programming language. With these protocols, we have the capacity to create our own objects which can interact with the language at a more fundamental level. Here, we saw how the __enter__() and __exit__() methods allow us to hook into the context manager protocol, allowing us to abstract away the setup and teardown of I/O instances using the 'with' keyword. The result of this is beautiful, idiomatic Python code.

要查看或添加评论,请登录

Michael Green的更多文章

  • In Python, decorators are where it's @

    In Python, decorators are where it's @

    The title of this post is a bit over hyped, predominately due to the fact that decorators fundamentally don't add any…

社区洞察

其他会员也浏览了