SQL vs NoSQL: how to choose a database for your application?
Tamer Khraisha (Ph.D)
Software Engineer | O'Reilly Author | Financial Data Management and Technology
When developing a software application, an important decision often concerns the database to choose for storing and manipulating data. Such a decision can be influenced by a variety of factors, constraints, requirements, and design principles that the development team embraces. In this regard, a distinction is often made between SQL and NoSQL database systems. Since these technologies offer different features and internal working mechanisms, one should be careful to consider the right strategy to make a wise choice. This article offers a simple approach to guide the database design choice.
To begin, it should be noted that one should not talk about SQL and NoSQL as databases, but as Database Management Systems (DBMS)? that are implementations of data management patterns. Other data management patterns exist such as filesystem storage, but in this article, I will focus on SQL and NoSQL systems. The SQL data management pattern is often implemented via a Relational Database Management System (RDBMS) that uses Structured Query Language (SQL) to interface with the database. RDBMSs are the oldest and most diffused DB management systems and they have well-defined SQL standards, design patterns, and software integration tools. Examples of SQL databases include Postgres, MySQL, MariaDB, and Microsoft SQL Server, to name a few. On the other hand, NoSQL (Not Only SQL) is a class of DBMSs that are non-relational and generally do not use SQL. NoSQL DBMSs come in a variety of types and, unlike SQL RDMS, they do not have behavioral commonalities or standards. Examples of NoSQL DBMS include document databases (e.g. MongoDB, CouchDB)? key-value stores (e.g. Azure Cosmos DB), column-oriented databases (e.g. Apache Hbase), and graph databases (e.g. Neo4j). Some NoSQL systems are multi-model, meaning that they can store more than one type of these four data structures. For example, Amazon DynamoDB can support both document and key-value data types, while Azure Cosmos DB can work as a key-value, column, document, and graph DBMS.
An important thing to keep in mind is that NoSQL DBMSs are not simply competitors with SQL RDBMS, as they emerged to address different requirements with different design patterns, some of which are related to shortcomings of SQL RDBMS. Additionally, recent developments in cloud and distributed technologies have led to the development of SQL database systems that address several of the drawbacks that justified the emergence of NoSQL solutions, thus reducing the gap between SQL and NoSQL technologies.
Nevertheless, people still feel confused as to what are the main criteria to follow when presented with the many choices of SQL and NoSQL technologies currently available in the market. To this end, the present article offers the following questions to answer carefully when making the database decision:
Is transactional guarantee crucial for your application?
Transactional guarantee refers to the feature that ensures that (concurrent) read/write operations either succeed or fail without causing side effects like having half-finished operations or inconsistent data. This guarantee is often abbreviated as ACID (Atomicity, Consistency, Isolation, and Durability). Here, isolation keeps the intermediate state of a transaction hidden from other transactions; consistency ensures that the data is consistent before and after the transaction; atomicity ensures that all operations within a transaction are carried out as a single unit; durability ensures that the data is persistent even in the event of a system failure.
In applications that involve financial transactions or online booking, having ACID transactions is crucial for the system, and therefore it is often considered one of the main factors in deciding which database to choose. In this regard, SQL RDBMSs such as Postgres is ahead of NoSQL systems as they offer more reliability in terms of ACID transactions. This is further reinforced in the SQL world by the fact that the ACID guarantee is part of SQL standards that RDBMS (open source and commercial) often thrive to implement and support, while in NoSQL solutions there are still no behavioral commonalities that follow specific industry standards. Recently, NoSQL databases like MongoDB, Google BigTable, and Google Datastore have made good progress in providing acid transactions; for example, MongoDB document-level operations are atomic, and Datastore has row-level atomicity. Graph databases such as Neo4j offer fully-acid operations. Nevertheless, many NoSQL DBMSs such as MongoDB and Datastore are not highly reliable for multi-document (or multi-row) transactions.
Crucially, It should be mentioned that even among SQL RDBMS, the level of transactional guarantee might vary depending on the design goals and principles of the DBMS. For example, Postgres implements most SQL Standards (and even more) and as a result, it might not be as fast as other RDBMS that implements fewer SQL standards in order to optimize performance. As another example, consider the Online Transaction Processing (OLTP) vs Online Analytical Processing (OLAP) database design patterns. Both OLTP and OLAP can be SQL systems, however, the basic difference between them is that OLTP is designed for the processing of transactions (many small inserts and updates), whereas OLAP is more focused on analytical processing (many big inserts and reads). Examples of OLTP technologies are Postgres and MySQL, while products like Google BigQuery are considered OLAP. In this regard, it makes no sense to force an OLAP solution to support transactional guarantees as they are not meant for that purpose.
Is referential integrity required for your application?
Referential integrity is a database property that ensures references between data and tables are valid. For example, in a data model where there is a table for products and another for orders, one should not be able to place an order in the orders table if the product to order does not exist in the products table. SQL DBMSs excel in guaranteeing referential integrity by using foreign constraints, while NoSQL databases offer no such guarantees.
领英推荐
Does your application data have a predefined schema?
If the application being developed has a well-structured schema that can be predefined in advance, then SQL DBMSs are likely to be the best choice. Using indexing and normalization, SQL systems can guarantee a high level of performance given a well-defined data model. However, if one wants to keep a flexible schema structure, or if the data is semi-structured, then NoSQL solutions like MongoDB or other commercial products might be the way to go. NoSQL databases are in general schema-free, which means that your data model can adapt and change easily and dynamically. In other situations, one might want to extract only certain fields from semi-structured or unstructured data, and store it in a SQL RDBMS, while storing the other parts of the data in a NoSQL or file system. This approach however might introduce performance issues due to the possible multiple lookups when retrieving all the data. This is where an important development in SQL RDBMSs comes into the scene: support for unstructured data such as JSON or XML. For example, since v.9.2 Postgres added native JSON support, allowing users to use Postgres as a "NoSQL" database. When taking this approach, always consider the cost and space consumption of relational databases.
Is scalability crucial for your application?
Database scalability is the ability of a database system to handle varying demands by adding/removing resources in order to keep performance constant/optimal. Traditionally, Relational database systems were not scalable by design. They often rely on server-client architecture where there is one node for writing operations and multiple other nodes for reading operations. When needed, scaling of relational databases has been usually supported vertically, in that more resources get added when the system can’t cope with the increasing requests. The main issue with vertical scaling is that it can quickly reach a limit in terms of maximum processing power, storage, and throughput. This is where NoSQL technologies show their main power: horizontal scalability. NoSQL DBMSs are designed for scalability and performance. Data in a NoSQL DBMS are distributed in nature, and they are well-suited for handling big amounts of data. This means however that they need to adhere to the CAP theorem (consistency, availability, and partition tolerance). The CAP theorem states that, given a distributed database system, when a network failure occurs then it is possible to guarantee either consistency (read operations get the most recent data) or availability (all reads return data but it may not be the most recent)- but not both. Some NoSQL systems prioritize availability over consistency (e.g. DynamoDB, Apache Cassandra, Neo4j), while others prioritize consistency (MongoDB, Azure Cosmos DB). Additionally, systems like Cassandra promise that they can actually provide both high availability and consistency. Having in mind the consistency vs availability trade-off when choosing the DBMS can help make the right choice and avoid unexpected drawbacks.
Crucially, though NoSQL DBMSs have gained a reputation for being better at scaling horizontally than RDBMS, it should be noted that in recent years, SQL database systems have been adapted to support horizontal scalability by relying mainly on partitioning. For example, Google BigQurey uses static partitioning to scale horizontally, while Snowflake uses dynamic partitioning (via micro-partitioning) to achieve horizontal scalability.
Do you care about data redundancy?
NoSQL DBMS like MongoDB usually occupies more space since they store each document with the field names stored in it. This generates redundant data which takes up space. However, storage is currently so cheap that many consider this a minor shortcoming, and some NoSQL databases also support compression to reduce the storage footprint.? SQL databases on the other hand have features like data normalization which allows for reducing redundancy in the database.
Do you want to perform complex queries?
SQL RDBMS are highly optimized for performing complex queries with various filters and cross-table join options, while NoSQL databases often struggle with queries that involve multiple conditions, aggregation, and analytical querying. Joining documents in a NoSQL database like MongoDB can be a tedious task, while relational databases offer highly optimized joining features between relations. However, this depends on the use case. If an application simply retrieves data from the database using simple queries, then a NoSQL system might actually be a better choice since they are known for its query speed. On the other hand, queries that aim at applying complex transformations to the data will require more sophisticated querying capabilities which are more reliable in an RDBMS. In other situations, one might want to perform complex queries but they can hit a bottleneck in a SQL system. For example, imagine you have a table that has two columns: person names and their friends. A person might have more than one friend. Imagine you want to find the friends of a person, in such case it’s a simple query that can be performed with a WHERE condition if we use an SQL database system. Now imagine we want to find the friends of a friend of a person or even go deeper in the loop, it would still be possible to do using SQL but it will involve excessively complex self-joins that will make the query unreadable and unmaintainable. Instead, using a NoSQL system like Neo4J, which can store graph and network data, we can easily run such queries using simple and readable syntax.
In conclusion, I recommend that you understand the nature of the data that your application will generate, and its use cases, and try to answer the questions in this article before choosing the right SQL or NoSQL database management system.
Atuals
2 年Very interesting ??