登录查看更多内容

Handling Data Format Changes in Databases: A Practical Example

Pradosh Jena

GenAI | LangChain | AWS | Python | Big Data | Snowflake | DBT Cloud Data Architect | Post Graduate in Data Science | 5x Certified | Writer @Medium

发布日期: 2024年7月29日

In the world of software development, data formats evolve as applications grow and requirements change. One of the challenges developers face is ensuring that their applications can handle different data formats, especially when switching from one schema design to another. This situation is common in document databases, which are often praised for their schema flexibility. However, this flexibility can sometimes lead to complications when reading and writing data. In this blog, we will specifically focus on a practical example of handling a data format change in a document database.

The Scenario: Changing User Data Format

Imagine you have a document database storing user information. Initially, the user data includes a single field for the full name:

{

"user_id": "123",

"name": "John Doe"

}

Later, you decide to separate the full name into two fields: first_name and last_name. This change could be due to various reasons, such as the need for better data manipulation, improved search capabilities, or more personalized user interactions. The new format looks like this:

{

"user_id": "123",

"first_name": "John",

"last_name": "Doe"

}

Implementing the Change: Schema-on-Read vs. Schema-on-Write

Document databases are often described as "schemaless," which is a bit of a misnomer. In reality, there is an implicit schema defined by how the application expects the data to be structured. This concept is known as schema-on-read, where the structure is interpreted when the data is read, as opposed to schema-on-write, where the schema is enforced when data is written, typical in relational databases.

To implement this change in a document database, you don't need to alter an existing schema since the database does not enforce one. Instead, you write code to handle both the old and new data formats. Here's an example in JavaScript:

if (user && user.name && !user.first_name) {

// Documents written before the change do not have first_name and last_name

let nameParts = user.name.split(" ");

user.first_name = nameParts[0];

user.last_name = nameParts.slice(1).join(" ");

}

Let's break down this code:

1. user : This checks if the user object exists. It's a basic null check to avoid runtime errors.

2. user.name: This checks if the name field exists, indicating the old data format.

3. !user.first_name : This checks if the first_name field does not exist, confirming that the document is indeed in the old format and needs to be updated.

This conditional block ensures that the application can handle data regardless of whether it is in the old or new format. If the data is in the old format (full name in a single field), it splits the name into first_name and last_name. This approach is dynamic and requires careful handling to ensure data integrity.

Arpit Bhayani 1 年前

SQL

Darshika Srivastava 1 年前

SQL

Darshika Srivastava 1 年前

Handling the Migration in Relational Databases

In contrast, relational databases use schema-on-write, where schema changes are explicitly defined and enforced. To migrate the data, you would typically add new columns to the table and update existing records. For example, in SQL:

ALTER TABLE users ADD COLUMN first_name VARCHAR(255);

ALTER TABLE users ADD COLUMN last_name VARCHAR(255);

UPDATE users SET

first_name = split_part(name, ' ', 1),

last_name = substring(name from position(' ' in name)+1);

Here, the schema change is enforced at the database level, ensuring that all new and existing data conforms to the new structure. This approach provides strict data consistency but requires a coordinated update to both the database schema and the application logic.

The Trade-Offs

The main advantage of schema-on-read is flexibility. Applications can evolve without needing immediate changes to the database schema. This flexibility, however, comes with the trade-off of having to manage multiple data formats within the application, which can increase complexity and the risk of errors.

Schema-on-write, while more rigid, provides consistency and integrity. The database ensures that all data adheres to the defined schema, simplifying the application's data handling logic. However, changes require more coordination and can lead to downtime or complex migration procedures.

Conclusion

Choosing between schema-on-read and schema-on-write depends on the specific needs of your application, including factors like data consistency requirements, development agility, and maintenance overhead. In our example, handling a change from storing a full name to separate first and last names illustrates the practical implications of these two approaches. By understanding the strengths and weaknesses of each, you can make informed decisions about how to manage your data and schema evolution effectively.

Food For Thought

What kind of database would you choose , if you want to store someone's LinkedIn profile information ?

What kind of database would you choose to give a seamless browsing experience to an user on an ecommerce site.

Is cache and queues are not one type of storage mechanism?

Please share with your comments.

It's short and concise article to highlight this key point what needs to be taken into account while choosing the right kind of database for your application. For more in-depth reading you can refer https://learning.oreilly.com/library/view/designing-data-intensive-applications/

I will publish many more such article on such design principle in such simple language.

Data & AI Fusion

156 位关注者

要查看或添加评论，请登录

Pradosh Jena的更多文章

The Evolution of Data Architecture: Operational Databases vs. Direct Data Warehouse Ingestion

2024年10月15日

The Evolution of Data Architecture: Operational Databases vs. Direct Data Warehouse Ingestion

In the rapidly evolving landscape of data management, organizations are constantly seeking to optimize their data…
Can a Traditional Database Be Used as a Data Warehouse in the Past?

2024年10月15日

Can a Traditional Database Be Used as a Data Warehouse in the Past?

The question might seem trivial at first, as the distinction becomes evident when we examine modern data warehouse…
Snowflake Managed Access Schemas: Centralized Control for Enhanced Security

2024年8月5日

Snowflake Managed Access Schemas: Centralized Control for Enhanced Security

In the world of data management, ensuring the security and proper governance of your data is paramount. Snowflake, a…

2 条评论
From Artificial to Authentic: The Remarkable Evolution of AI with ChatGPT

2023年11月28日

From Artificial to Authentic: The Remarkable Evolution of AI with ChatGPT

Introduction: In the dynamic landscape of technology, the metamorphosis of artificial intelligence (AI) has witnessed a…
A Journey into Generative AI with GANs and VAEs

2023年8月9日

A Journey into Generative AI with GANs and VAEs

Introduction: In the ever-evolving landscape of technology, we find ourselves on the cusp of a remarkable…
Transforming Database Paradigms with Generative AI: Empowering Intelligent Data Interaction

2023年8月8日

Transforming Database Paradigms with Generative AI: Empowering Intelligent Data Interaction

Introduction The world of data is evolving at an unprecedented pace, and the traditional methods of managing and…

See all articles

Handling Data Format Changes in Databases: A Practical Example

Pradosh Jena

GenAI | LangChain | AWS | Python | Big Data | Snowflake | DBT Cloud Data Architect | Post Graduate in Data Science | 5x Certified | Writer @Medium

领英推荐

Data & AI Fusion

156 位关注者

Pradosh Jena的更多文章

社区洞察

其他会员也浏览了

??? The Birth of SQL & the Relational Database: A Journey Through Time ?

SPL: a database language featuring easy writing and fast running

SQL: The Ultimate Guide to Mastering the Backbone of Data Management

Understanding Inline Table-Valued Functions in Database Systems: Use Cases and Benefits

Level Up Your SQL Skills: 7 Steps to Becoming a Top Developer

The Importance and Best Practices of SQL Language

Understanding the Execution of Expressions in Databases: Technical Insights and Future Directions

Database Indexing for SQL Query Optimization

Code-First vs. DB-First: A Balanced Guide for Technical Leaders

领英推荐

Data & AI Fusion

156 位关注者

Pradosh Jena的更多文章

The Evolution of Data Architecture: Operational Databases vs. Direct Data Warehouse Ingestion

Can a Traditional Database Be Used as a Data Warehouse in the Past?

Snowflake Managed Access Schemas: Centralized Control for Enhanced Security

From Artificial to Authentic: The Remarkable Evolution of AI with ChatGPT

A Journey into Generative AI with GANs and VAEs

Transforming Database Paradigms with Generative AI: Empowering Intelligent Data Interaction

社区洞察

其他会员也浏览了

??? The Birth of SQL & the Relational Database: A Journey Through Time ?

SPL: a database language featuring easy writing and fast running

SQL: The Ultimate Guide to Mastering the Backbone of Data Management

Understanding Inline Table-Valued Functions in Database Systems: Use Cases and Benefits

Level Up Your SQL Skills: 7 Steps to Becoming a Top Developer

The Importance and Best Practices of SQL Language

Understanding the Execution of Expressions in Databases: Technical Insights and Future Directions

Database Indexing for SQL Query Optimization

Code-First vs. DB-First: A Balanced Guide for Technical Leaders