登录查看更多内容

The shifting pH of Databases from ACID to BASE.

Bargunan Somasundaram

Java | Spring Boot | Angular | Microservices | Kafka | Kubernetes | Azure APIM | AIOps

发布日期: 2020年1月27日

Today it is said that data is the new oil. I will also add that data is the new gold. Industry 4.0 is focused on data. Data is now considered one of the most important commodities. ‘Big data’ has become an inevitable reality today but bigger isn’t always better, Big insights are more important than Big data. Now to extract value from gold or oil, it needs to be processed – fashioned into jewelry, minted into coins or refined to produce different petroleum products. Similarly, data must be processed and held in a vault (database or data-store). Big insights can be possible only with the right database for daily operations. An explosion of consumer data has enabled IT companies and giants to shift the pH of their databases from ACID to BASE. Let’s see how.

In the early years of computers, ‘punch cards’ were used for input, output, and data storage. Punch cards offered a fast way to enter data, and to retrieve it. After Punch cards, databases came along. Database Management Systems allowed us to organize, store, and retrieve data from a computer. It is a way of communicating with a computer’s “stored memory.” Airlines were one of the first industries that identified the need for relational databases. The SABRE system was used by IBM to help American Airlines manage its data. The data-stores have started to evolve from the primitive approach of CODASYL to SQL (ACID) to NoSQL (BASE).

Transactions

The idea of transactions, their semantics, and guarantees, evolved with data management. As computers became more powerful, they were tasked with managing more data. Eventually, multiple users shared data on a machine. This led to problems of data being changed or overwritten while other users were in the middle of a calculation. This was an issue that needed addressing. Thus, the academics were called in and they came up with the ACID properties for transactions that could solve the consistency issues.

In the context of databases, a sequence of database read/write operations that satisfy the ACID properties (these can be perceived as a single logical operation on the data) is called a transaction.

To understand the importance of transactions, consider this analogy -transferring money from one account to another. This operation includes the below two steps:

Deduct the balance from the sender’s bank account
Add the amount to the receiver’s bank account

Now think of a situation where the amount is deducted from the sender’s account but is not transferred to the receiver’s account due to some errors. Such issues are managed by the transaction management, where both steps are performed in a single unit. In the case of a failure, the transaction should be roll-backed.

Below are the basic tenets of the ACID Model, a set of guidelines for ensuring the accuracy of database transactions.

Atomicity
Consistency
Isolation
Durability

Atomicity

It is the guarantee that a series of operations either succeed or fail together, because all components of a transaction are treated as a single action. If one part of a transaction fails, the database’s state remains unchanged, there are no partial updates.

For example, a business transaction might involve confirming a shipping address, charging the customer and creating an order. If one of these steps fails, all should fail.

Consistency

Consistency is the second stage of the ACID model. A transaction either creates a new and valid state of data or, if any failure occurs, returns all data to its state before the transaction was started.

For example, a column in a database may only have the values for Days as “Monday” to “Sunday”. If a user were to introduce a new day, then the consistency rules for the database would not allow it.

Isolation

Transactions require concurrency control mechanisms, and they guarantee correctness even when being interleaved. Isolation brings us the benefit of hiding uncommitted state changes from the outside world, as failing transactions shouldn’t ever corrupt the state of the system. Isolation is achieved through concurrency control using pessimistic or optimistic locking mechanisms

Here is an example: If Bob issues a transaction against a database while Harry issues a different transaction, both transactions should operate on the database in isolation. The database should either perform Bob’s entire transaction before executing Harry’s or vice-versa. This prevents Bob’s transaction from reading intermediate data produced as a side effect of part of Harry’s transaction that will not eventually be committed to the database.

It is important to note that the isolation property does not ensure that a specific transaction will execute first, only that they will not interfere with each other.

Durability

After the successful completion of a transaction in the system, the data remains in the correct state, even in case of a failure and system restart.

The Need for BASE Models

Let’s go through the life-cycle of an application to understand the need for the BASE model. Let’s suppose an e-commerce application is developed. At the initial soft launch, the database is moved from a local workstation to a shared, remotely hosted MySQL instance with a well-defined schema. As soon as the application becomes popular, a problem arises. There are just too many reads hitting the database.

This is quite usual with any application. The first attempt would be to cache frequently executed queries. Generally, memcached or any third-party cache providers like EHCache, OSCache are employed for caching. But note that the reads are no longer in compliance with the ACID model. The data is inconsistent because it is in more than one place. This also means that the cache is serving older/stale data till the time DB updates the cache.

As the application’s popularity grows, new features like faceted search, on-page check out, customer reviews, live chat, etc. are introduced. If each feature was in its table, hundreds of joins would be required to prepare such a page. This would increase query complexity. To avoid too many joins, denormalization must be done.

If the application’s popularity surges further, it will swamp the server and slow things down. Thus, server-side computations such as stored procedures must be moved to the client-side. Even after this, there would be some queries that are still slow. So, periodically the most complex queries are pre-materialized, and joins are avoided in most cases.

Now, the reads might be okay, but writes are getting slower. Thus, the secondary indexes and triggers are dropped. At this point the DB is left with:

No ACID properties due to caching.
No normalized schema due to denormalization
No stored procedures, triggers, and secondary indexes.

The ACID model is an overkill or would hinder the operation of the database. These issues gave birth to a softer model called BASE, which is extensively used by the NoSQL datastores.

Basic tenets of BASE model

Basic Availability

The data-store does guarantee availability, in the presence of multiple failures. Thus, the database appears to work most of the time because of replication.

Soft State

Soft State indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model. In a way, datastores don’t have to be write-consistent or mutually consistent all the time.

Eventual Consistency (Weak consistency)

When multiple copies of the data reside on separate servers, an update may not be immediately made to all copies simultaneously. So, the data is inconsistent for a period of time, but the database replication mechanism will eventually update all the copies of the data to be consistent.

Conclusion

Suitability of the ACID or BASE model varies case-by-case and depends on the read and write patterns. Transactions are omnipresent in today’s enterprise systems, providing data integrity even in highly concurrent environments. So, choose ACID when there is a need for strong consistency in transactions and the schema is fixed.

In the age of IOT, AI/ML, High-Performance Computing is inevitable, and the computing requirements are astronomical. Eventual consistency gives the IT giants edge over others in the industry by enabling their applications to interact with customers across the globe, continuously, with the necessary availability and partition tolerance. All this, while keeping their costs down, systems up, and their customers happy. So, go for the BASE model data-stores when there’s a high priority for availability and scalability and the schema is evolving. At the same time, BASE data-stores don’t offer guaranteed consistency of replicated data at write time but in the future. BASE consistency model is primarily used by aggregate stores, including column family, key-value and document stores. Hbase, SOLR, Cassandra, Elastic search are based on BASE models. Every relational database such as MySQL, PostgreSQL, oracle and Microsoft SQL, support ACID properties of transactions.

Udhayakumar Nagarathinam

Senior Manager at Nasdaq | JPMorgan Chase & Co | Goldman Sachs

5 年

Great job Bargunan Somasundaram

1 次回应

Miruthujay Veerannan

Engineering Manager @ Epsilon | Digital Media Solutions, Attribution and Forecasting

5 年

Good work

1 次回应

Vanaja Sivakumar

5 年

Quite interesting... Great job..??

1 次回应

Surbhi Anand

Lead QA Engineer

5 年

Quite informative ????

1 次回应

Indu C

Software Engineer II | MS in Computer Science

5 年

Great post????

1 次回应

查看更多评论

要查看或添加评论，请登录

Bargunan Somasundaram的更多文章

Big Data Lambda (λ) Architecture variants Explained!

2021年2月10日

Big Data Lambda (λ) Architecture variants Explained!

In the previous article ???????????? (λ), ?????????? (κ) ?????? ???????? (ζ) - The tale of three Big Data musketeers…

1 条评论
???????????? (λ), ?????????? (κ) ?????? ???????? (ζ) - The tale of three Big Data musketeers

2020年10月28日

???????????? (λ), ?????????? (κ) ?????? ???????? (ζ) - The tale of three Big Data musketeers

Architecture inspires people, no wonder so many famous writers, artists, politicians and designers have such profound…

12 条评论
Design Thinking on Big Data Architecture for AI/ML platforms.

2020年6月2日

Design Thinking on Big Data Architecture for AI/ML platforms.

“If you torture the data long enough, it will confess.” -Ronald H.

33 条评论
Discover, Monitor, Analyze & Predict COVID-19 with Big Data & AI/ML technologies.

2020年4月7日

Discover, Monitor, Analyze & Predict COVID-19 with Big Data & AI/ML technologies.

“Uber, the world’s largest taxi company, owns no vehicles. Facebook, the world’s most popular media owner, creates no…

3 条评论
The CAP on choosing the Right Distributed Database(NoSQL) for your application?

2020年3月18日

The CAP on choosing the Right Distributed Database(NoSQL) for your application?

Amazon found that every 100 milliseconds of latency, costs them 1% in sales. The application users, customers, and…

12 条评论
To SQL, NoSQL or NewSQL, that’s the Query!

2020年2月9日

To SQL, NoSQL or NewSQL, that’s the Query!

The amount of data we produce every day is truly mind-boggling. To leverage all this data, Is a SQL-based RDBMS good…

13 条评论
All about Apache Kafka – An evolved Distributed commit log

2020年1月12日

All about Apache Kafka – An evolved Distributed commit log

Call it big data or the big bang of data – we’re in an era of data explosion. Our daily lives generate an enormous…

4 条评论

See all articles

The shifting pH of Databases from ACID to BASE.

Bargunan Somasundaram

Java | Spring Boot | Angular | Microservices | Kafka | Kubernetes | Azure APIM | AIOps

Bargunan Somasundaram的更多文章

社区洞察

其他会员也浏览了

The Data Scientist's Dilemma: When NULL Isn't Just Nothing

8 Data Structures Powering Modern Databases-Scaler

Why Apache Iceberg Is the Key to Future-Proofing Your Data Strategy

Large datasets, slow queries, now what?

Navigating the Data Lake: The Tale of Iceberg and Delta—A Journey of Divergence and Synergy

Elasticsearch vs. CtrlB

MDS Newsletter #62

What is BigObject?

Big Data, Bigger Names

Day 23: Case Study – Designing a Real-Time Data Analytics Platform

Bargunan Somasundaram的更多文章

Big Data Lambda (λ) Architecture variants Explained!

???????????? (λ), ?????????? (κ) ?????? ???????? (ζ) - The tale of three Big Data musketeers

Design Thinking on Big Data Architecture for AI/ML platforms.

Discover, Monitor, Analyze & Predict COVID-19 with Big Data & AI/ML technologies.

The CAP on choosing the Right Distributed Database(NoSQL) for your application?

To SQL, NoSQL or NewSQL, that’s the Query!

All about Apache Kafka – An evolved Distributed commit log

社区洞察

其他会员也浏览了

The Data Scientist's Dilemma: When NULL Isn't Just Nothing

8 Data Structures Powering Modern Databases-Scaler

Why Apache Iceberg Is the Key to Future-Proofing Your Data Strategy

Large datasets, slow queries, now what?

Navigating the Data Lake: The Tale of Iceberg and Delta—A Journey of Divergence and Synergy

Elasticsearch vs. CtrlB

MDS Newsletter #62

What is BigObject?

Big Data, Bigger Names

Day 23: Case Study – Designing a Real-Time Data Analytics Platform