登录查看更多内容

Isolated unit testing using differents schemas with Python, unittest and SQLAlchemy

Nahuel Segovia

Software Developer in FluxIT. Making great digital products using Typescript with Node.js. Sometimes open source contributor. Cloud enthusiast. Typescript | Python | Docker | Kubernetes | CI / CD | GCP | AWS

发布日期: 2024年4月23日

I have worked with unit tests and I had to find a way to test some functionalities using the same data that is used by application in real context.

Why I decided to make tests using real data?

In the application, we have integrations with other external services, so making unit tests replicating data ensures that our system functions correctly interacting with dependencies
We can test security validations and transformation data operations that requires realistic use cases
Performance measuring: this approach allow us to understand and measure performance in realistic workloads

Proposed solution roadmap

The advantage that I have in this project is that we are using PostgreSQL as the main database engine, which is preferred to us because it has a powerful set of tools. One of them it's that it has schemas supported by default, so replication process could be relatively easy to do.

Wrong approach

I wanted to connect to other schemas specifiying with MetadaData option that is described in SQLAlchemy documentation, so I had code like this:

from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey

metadata = MetaData(schema="unit_tests")

class Users(Base):
    __tablename__ = 'users'

    id = Column(BigInteger, primary_key=True)
    name = Column(String(100), nullable=False)
    email = Column(String(100), nullable=False)

But using this way we can't switch between different schemas at runtime, so for each switch, we needed to change the schema parameter value changing the code.

Searching correct approach

At that moment I could not find any way to apply the switch from one schema to another at runtime, so I had decided to use plain SQL queries to keep with proposed solution, here is the code that I wrote:

Migration

We only going focus on upgrade and downgrade functions because it's about implementation

def upgrade() -> None:
    op.execute("CREATE SCHEMA unit_tests")
    op.execute("SET search_path TO unit_tests")

    op.execute(""" 
                CREATE OR REPLACE FUNCTION copy_tables_to_new_schema(original_schema TEXT, new_schema TEXT) RETURNS VOID AS $$
                DECLARE
                    table_record RECORD;
                BEGIN
                    FOR table_record IN 
                        SELECT table_name 
                        FROM information_schema.tables 
                        WHERE table_schema = original_schema AND table_type = 'BASE TABLE'
                    LOOP
                        EXECUTE 'CREATE TABLE ' || new_schema || '.' || table_record.table_name || ' AS SELECT * FROM ' || original_schema || '.' || table_record.table_name;
                    END LOOP;
                END $$ LANGUAGE plpgsql;
            """)

    op.execute("SELECT copy_tables_to_new_schema('public', 'unit_tests')")
    op.execute("SET search_path TO public")

def downgrade() -> None:
     op.execute("drop schema unit_tests")

This migration will create the new schema, then it will create and execute the function that copies all the tables with its content from the public schema. Finally, it will come back to the default schema(public)

领英推荐

7 Pieces of Advice After 5+ Years of Using SQL &…

Muhammad Ishtiaq Khan 7 个月前

PyData London

Michelle Conway 1 年前

Automating Weather Data Processing with Airflow…

Fidel .V 10 个月前

The initial issue that I faced

When I tried to execute a basic unit test I found that it had executed on public schema, which is the default schema in PostgreSQL, so that user was created on the public schema.

The problem

After I tried to create that user into the database, I realized that it was being created on "public" schema, I lost 2 hours until I thought about writing in the official GitHub account of the library. That was the answer from Michael Bayer(SQLAlchemy and Alembic creator)

PD: Thanks a lot for the answer, zzzeek.

This was the code that I was using until Michael told me that it was wrong

For each create_engine() I was creating a new connection, so basically even if I executed session.execute(text("SET search_path TO unit_tests")) then executing session.commit() I lost the schema

Solution

Function that will switch between schemas

Fix to execute all the sessions in the same schema

With this solution, all tests can be run in the correct schema, so finally, I could do it

要查看或添加评论，请登录

Nahuel Segovia的更多文章

How do we migrate thousands of data from Excel tables for the Cultural Institute of the Province of Buenos Aires?

2024年4月3日

How do we migrate thousands of data from Excel tables for the Cultural Institute of the Province of Buenos Aires?

One year and three months ago I had the responsibility of designing the entire architecture and features to create a…

1 条评论

Isolated unit testing using differents schemas with Python, unittest and SQLAlchemy

Nahuel Segovia

Software Developer in FluxIT. Making great digital products using Typescript with Node.js. Sometimes open source contributor. Cloud enthusiast. Typescript | Python | Docker | Kubernetes | CI / CD | GCP | AWS

领英推荐

Nahuel Segovia的更多文章

社区洞察

其他会员也浏览了

Introduction to Polar: A Modern DataFrame Library for Python

esProc SPL: Combining the Strengths of DuckDB and Python

Delta Lake with Python: How to Use Delta Lake Without Spark

F1 project - loading to the database

Python or Filtering in Queries: sorting JSON responses from an Oracle Database

Building Python SDK for Databricks REST API

Spark + Airflow - managing the time taken for schedule execution

4 Ways to Speed up Snowflake Development with Ice Pick

How to automate your BI Reports?

Speed up Python data access by 30x & more

领英推荐

Nahuel Segovia的更多文章

How do we migrate thousands of data from Excel tables for the Cultural Institute of the Province of Buenos Aires?

社区洞察

其他会员也浏览了

Introduction to Polar: A Modern DataFrame Library for Python

esProc SPL: Combining the Strengths of DuckDB and Python

Delta Lake with Python: How to Use Delta Lake Without Spark

F1 project - loading to the database

Python or Filtering in Queries: sorting JSON responses from an Oracle Database

Building Python SDK for Databricks REST API

Spark + Airflow - managing the time taken for schedule execution

4 Ways to Speed up Snowflake Development with Ice Pick

How to automate your BI Reports?

Speed up Python data access by 30x & more