A Comprehensive Analysis - NoSQL vs RDBMS

A Comprehensive Analysis - NoSQL vs RDBMS

Disclaimer: This post is a combination of original content and facts gathered from reputable sources sited below. I've been compelled to write these posts due so many tech writers putting out articles that are not technically sound, these posts are meant to be factoid for a "one-stop" reference. Also please keep in mind many of these topics are so new they are evolving as I type this post, so your inputs are greatly appreciated & welcomed.

I wanted to write in this post about a question that comes up for many working with Big Data in regards to databases. What are the differences between the different NoSQL database technologies? In this post, we will examine NoSQL database management types, how they are different than RDBMS (Relational DataBase Management System) and NewSQL technologies? and go into detail about the (4) main types of NoSQL database technologies.

Background RDBMS and Other Database Management Systems

So what is RDBMS? Relational DataBase Management System, is the basis for SQL (originally, Structured English QUEry Language, later named Structured Query Language), and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and others.

A Relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd from IBM. 

The fundamental principle of RDBMS is based on Codd's 12 rules:

Rule 0: The Foundation rule:

Rule 1: The information rule:

Rule 2: The guaranteed access rule:

Rule 3: Systematic treatment of null values:

Rule 4: Dynamic online catalog based on the relational model:

Rule 5: The comprehensive data sub-language rule:

  1. Data definition.
  2. View definition.
  3. Data manipulation (interactive and by program).
  4. Integrity constraints.
  5. Authorization.
  6. Transaction boundaries (begin, commit and rollback).

Rule 6: The view updating rule:

Rule 7: High-level insert, update, and delete:

Rule 8: Physical data independence:

Rule 9: Logical data independence:

Rule 10: Integrity independence:

Rule 11: Distribution independence:

Rule 12: The non-subversion rule:

Now that we have an understanding of the fundamental principals of RDBMS, let's briefly compare RDBMS, NewSQL and NoSQL.

Source: IBM 

Relational Data systems make up much of databases out there in the Enterprise and is still very dominant represented by Goliath vendors Oracle, IBM and Microsoft. These RDBMS can exhibit solid performance and consistency on the order of thousands of transactions per second, a key benchmark for many calibrating performance. 

In today's world, online transaction processing (OLTP) scenarios such as real-time bidding (RTB) in advertising, fraud detection in banking, multi-player games, risk analysis, and others, involves close to a million transactions per second - a transaction frequency that the majority of traditional RDBMS typically can't handle, but this isn't the only reason for the rise in NoSQL.

RDBMS has always been distinguished by the ACID principle (Atomicity (from Greek a-tomos, undividable), Consistency, Integrity, and Durability), which ensures that data integrity is preserved at all costs. SQL became the standard of data processing because it combines these elements. As stated in Wikipedia, "Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language, data manipulation language, and a data control language." 

In general terms, the newest entrant in the database arena, NewSQL, retain both SQL and ACID, but achieves higher performance versus RDBMS by utilizing distributed computing architectures. We will re-visit NewSQL in a later post but for now we will take a deeper look at the various NoSQL database management system types.

NoSQL database management systems store data in a variety of formats, wide column store, document store, graph store, and key-value store. There are now sub-categories but for the sake of clarity in this analysis we will stick to the main (4). NoSQL products disregard ACID standards to achieve data storage flexibility, look past tabular row-store, strict data definitions, and provision for scale with distributed architectures geared for supporting high-performance throughput. I wanted to also to make mention of "Multi Model NoSQL" which is a concept of using all (4) types of NoSQL types into one core as started by OrientDB back in 2010 and now others have followed, like MarkLogic and Amazon DynamoDB. These class of products we will revisited in a future post.

Just to highlight the number of NoSQL tools available, DB-engines.com, an online knowledge hub for all things database publishes a monthly rankings and recently announced MongoDB overtaking PostgresSQL for the #4 most utilized database management system out of the 289 systems they monitor. MongoDB being a document store is not exactly an apples to apples replacement of PostgresSQL, this will be more clear later in the post. You'll notice in the change in rankings (chart below) that many technologists are taking notice of MongoDB, Cassandra, Neo4j, and others as they continue to show their value for particular database management scenarios.

In taking a look at our the various NoSQL database management types, I would like to take a look at the basic NoSQL building block, Key Value store, the ever popular Wide Column store, Document stores and the most complex, Graph stores.

Key Value Store

Key value NoSQL data stores are really made up of (4) sub-categories named "KV - eventually consistent", "KV - ordered", "KV - RAM", and "KV - solid state disk and rotating drive", there will be a brief reference to these below as well. 

For the sake of this discussion we will focus only on general Key Value Store but wanted to make mention of the sub-categories should anyone want to look into them deeper. 

In contrast to RDBMS, as noted in Wikipedia, "key-value systems treat the data as a single opaque collection which may have different fields for every record. This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. Because optional values are not represented by placeholders as in most RDBMS, key-value stores often use far less memory to store the same database, which can lead to large performance gains in certain workloads.

Performance, a lack of standardization and other issues limited key-value systems to niche uses for many years, but the rapid move to cloud computing after 2010 has led to a renaissance as part of the broader NoSQL movement. A subclass of the key-value store is the document-oriented database, which offers additional tools that use the metadata in the data to provide a richer key-value database that more closely matches the use patterns of RDBMS. Some graph databases are also key-value stores internally, adding the concept of the relationships (pointers) between records as a first class data type."

Wide Column Store

A column is the basic unit in a wide-column database and consists of a key and value pair. For example, a column might have the key “name” and the value could be a string representing a name.

In most systems, a third piece of data is usually held for each column: a time stamp that records when the data was added to the database or last modified. For simplicity, ignore time stamps for now. Unlike traditional databases, a column in a wide-column database is not something that you define in advance – they are created as needed when sending data to the database. The illustration (below) highlights these column families.

You can consider a column family to be the equivalent of a table – each row in the column family has a unique identifier key, and then as many columns as it needs to hold the information. There is no requirement for every row to have the same columns, but the family itself must be declared before it’s used as column families typically represent how data is actually saved to disk.

In the example diagram of a column families above, the first row contains two columns, one called “name” and a second called “city”; but the second row contains two more columns called “title” and "salary." This flexibility to have different arrangements of columns is where this type of database is different from RDBMS. 

Document Store

A document-oriented database designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of NoSQL databases and the popularity of the term "document-oriented database" has grown[1]with the use of the term NoSQL itself. MongoDB mentioned above falls into this category of NoSQL and is now being used quite extensively based on the DB-engines latest rankings.

Source: Couchbase website

Document-oriented databases are inherently a subclass of the key-value store, another NoSQL database concept. The difference lies in the way the data is processed; in a key-value store the data is considered to be inherently opaque to the database, whereas a document-oriented system relies on internal structure in the document order to extract metadata that the database engine uses for further optimization. Although the difference is often moot due to tools in the systems,[a] conceptually the document-store is designed to offer a richer experience with modern programming techniques. XML databases are a specific subclass of document-oriented databases that are optimized to extract their metadata from XML documents.

Document databases contrast strongly with the traditional RDBMS. Document databases get their type information from the data itself, normally store all related information together, and allow every instance of data to be different from any other. This makes them more flexible in dealing with change and optional values, maps more easily into program objects, and often reduces database size. Modern web applications that are continually changing makes documents stores attractive for programming and especially where speed of deployment is a priority.

Graph Store

This is the most complex database management system and possibly the most exciting for the future of predictive analytics and artificial intelligence. A graph database, also called a graph-oriented database, is a another type of NoSQL database that uses graph theory to store, map and query relationships. Across the internet there are many diagram examples explaining this but I like to use "the graph of gods" from the Aurelis Github blog site (below) which is a fun way of illustrating this concept. 

Source: Aurelis Github blog

A graph database is essentially is a collection of nodes and edges. Each node represents an entity (such as a person or business) and each edge represents a connection or relationship between two nodes. Every node in a graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs. Each edge is defined by a unique identifier, a starting-place and/or ending-place node and a set of properties. The mantra of graph database enthusiasts is "If you can whiteboard it, you can graph it."

Graph databases are well-suited for analyzing interconnections, which is why there has been a lot of interest in using graph databases to mine data from social media. Graph databases are also useful for working with data in business disciplines that involve complex relationships and dynamic schema, such as supply chain management, identifying the source of an IP-based communication issue and creating recommendation engines. If you would like to know more about graph databases and how they compare to RDBMS here is a published paper on the comparison. 

Trivial Pursuit Note: The concept behind graphing a database is often credited to 18th century mathematician Leonhard Euler, who's face some of you may have seen on the 10 Franc Swiss Bank note.

Conclusion

One takeaway from this post should be that no matter what, there is no one size fits all for these database management system needs. Every business case needs to be evaluated carefully understanding the pros and cons prior to putting together a solution. 

I hope this post has shed some light on NoSQL Database Management Systems, and why they are growing rapidly in popularity and which ones to consider into your current Big Data environment depending on the scenario. Feel free to post your comments below, would love to hear from everyone. These posts are meant to be as much informative as collaborative.Rassul Fazelat (follow me here @BigDataVision), is Managing Partner - Founder of Data Talent Advisors, a boutique Data & Analytics Talent Advisory & Headhunting firm, Organizer of NYC Big Data Visionaries MeetupCo-Organizer of NYC Marketing Analytics Forum & Co-Organizer of NYC Advanced Analytics Meetup.

Other posts in the Comprehensive Analysis (Big Data) series:

 Big Data Career series:

Sources: DB Engines.com, Couchbase website, Aurelias Github blog site, Neo4j website, Wikipedia SQL, Codd's Rule, Document store, Graph datastore, Key value store and Wide column store.

Sandeep Khandewale

Experienced Software Engineer - React.JS, NodeJS

7 年

Good one. this reminds me of a book called "Seven Databases in Seven weeks"

Aditya Madiraju

Driving the Point!

7 年

Good one! These 2 statements does it for me. "One takeaway from this post should be that no matter what, there is no one size fits all for these database management system needs". "Every business case needs to be evaluated carefully understanding the pros and cons prior to putting together a solution." The true value of data is not in storing, but in linking together when needed. The narrative of deep learning seems to have a missing link in that machine's learning happens only after training them towards a link & making them understand those links!! Cheers,

回复
Rassul Fazelat

President & CEO @ Data Talent Advisors | Data, Analytics, RAG & GEN AI Recruiting

7 年

Those of you wondering why Big Data solutions??...this post establishes foundational reasons why standalone legacy DBMS don't cut it anymore.

Munvar Mohammed

Sr. Manager - Software Engineering - Global Supply Chain Management @Dell

8 年

Nice article, clear information

回复
Timur King

Director at SMBC Capital Markets

8 年

Good visual summary of NoSQL in middle (Evolving Database landscape). I think "big tables" ~ "wide column store" mentioned later.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了