Database Architectures - part 1

This article talks about the different architectures available in persisting and managing data in various environments

We will discuss the following systems at high level and their comparison

  • Relational Database Systems
  • NoSQL

Relational Database Systems

Flat Data

In order to better understand the concept of relational databases, let's take a look at simple flat data which might be contained in a simple table format or a comma-separated-value file.

Let's see an example to help illustrate the shortcomings with this flat data format.

A smartphone company create an online poll to determine the market strategy for their smart phone sales campaign. In that poll they ask for Name, City, Age, whether they use smart phone and what is their favourite brand. With this poll data, they hope to devise a new smart phone related marketing plan in different regions globally and for different age segments.

Shown right side is the sample results of this poll.


Although the poll has in fact captured the data, is is difficult to extract value, for below example, to show all the results for Hyderabad? They would have to search every possible representation entered, including typo errors. The same issue appears for other columns as well. Also there could be users with the same name, which might cause problems.

Relational Database

A better approach to solve the types of problems discussed above is to use a relational database. But we need to change the structure of the table as below

  • Add an ID field (Primary Key), which will be generated by each response to the poll, which creates a new entry in the database
  • Instead of free form of entry for City, main table relates to another table in the same database which has a primary key say ID column.
  • Create a table for age with primary key ID and limit the list to 4 different age ranges/groups, and relate the same with main table for Age.
  • Create a table for "Uses Smart Phone" with primary key ID and pre defined set of values and relate the same with main table.
  • Create a table for list of Brands with primary key ID and relate the same with main table

Collecting data by using the relational database eliminates the problems of the flat data storage discussed earlier.

In the fig right side, we can see that the numbers have replaced the values in many fields. These represent the key values in the other tables.


We can query a particular record with exact values based upon the data in all related tables. In any table,

  • Columns represent the fields or attributes of the database
  • Rows refer to records or tuples
  • Table it self could be referred to as a relation
  • View is a requested result set from the database. It could be a table listing specific values such as "All users from Hyderabad, ages 18-25, who uses smart phone very much" or calculated information such as the "sum of total of smart phone users by city" etc..

These are the basic premises behind relational databases, next let us discuss the language these relational databases employ.

SQL - Structured Query Language

SQL is the standard computer language for relational databases. It's used for storing, manipulating & retrieving data

Some of the relational database systems or platforms where SQL is used are Oracle, MySQL, MS Access, Sybase, Postgres, SQL Server etc.

SQL allows uses to:

  • Describe, Define, Manipulate and Access the data
  • Embed within other languages
  • Create and drop databases, tables, views, store procedure and functions etc..
  • Set permissions on database, tables, procedures and views etc..

Key benefit - ACID Compliance

Once of the key benefits of Relational Databases and SQL is ACID compliance. ACID stands for Atomicity, Consistency, Isolation and Durability.

  • Atomicity - Transactions are all or nothing, i.e it ensures that a transaction will not be successful unless all parts of it are successful. An incomplete transaction results in no change to the database, which reduces the probability of data corruption.
  • Consistency - Only valid data is saved, i.e it ensures against invalid data. Transactions that do not follow the validation rules of the database are rolled back to ensure against data corruption.
  • Isolation - Transactions do not affect each other, i.e it protects against data issues resulting from concurrent transactions.
  • Durability - Written data will not be lost i.e it ensures that all completed transactions are saved even if a technology failure occurs.

The vast majority of relational databases are ACID compliant. Compliance is necessary for certain types of data, such as financial or personal data or data requiring compliance with certain regulatory bodies.

Certain datasets depending on the use case might not required all four aspects of ACID compliance. For example, many distributes databases write asynchronously to various nodes, eventually achieving consistency. This would not be ACID compliant but could be perfectly acceptable in many situations.

NoSQL Systems

What is NoSQL?

The term NoSQL can refer to Not SQL or Not Only SQL, in practice NoSQL most often refers to non-relational databases that are designed specifically for distributed environments.

To understand NoSQL's non-relational architecture, let's take the below example.

The table shows a relational data. The design is rigid. For example, the Price column would only accept valid price, anything other than number would be invalid.

NoSQL forgets tables entirely. Instead, NoSQL stores data in documents. The format of the document is similar to JSON. Below is the sample document containing a single record similar to a row in the above table.

{
	ID : 10000001,
	Product : "android phone",
	Brand : "Samsung",
	Price : 8000,
	UOM : "ea"
}

A set of there documents containing individual records would be called a collection.

Unlike SQL tables, in NoSQL any data can be stored in any document. Every individual record can contain different attributes as well.

Let's say, If we need to add new attribute such as ratings to our product data :

  • In the relational database, we would have to first change the database schema by adding new column. Then we'd have to migrate the original data to the new schema (Though most of the relational databases like MySQL, Oracle handle this internally in simpler cases). Only then we could populate the new attribute with values.
  • In NoSQL's dynamic schema architecture changing fields and structure is seamless. This is one of the key advantages of NoSQL.
{
	ID : 10000001,
	Product : "android phone",
	Brand : "Samsung",
	Price : 8000,
	UOM : "ea",
    rating: "5/5",
    reviews: [
        {
          name: "Naveen", 
          review: "Not sure if this is the latest android version phone?"
        ?}
    ]
}

If reviews need to be supported, in the NoSQL the same can be added to the original document as shown above. Where as in case of relational databases reviews should be stored in separated table referring the product which was reviewed.

Types of NoSQL Databases

There are four categories of NoSQL database available at present

key-value stores : Redis, Oracle BDB, Riak, Amazon DynamoDB

  • Uses keys to access stored values. Stored values can be of any type of binary object, from simple text to audio/video files.
  • This is the most flexible NoSQL model, as shown here each key can point to different set of attributes.
  • key-value stores as highly effective at scaling applications that deal with high velocity. Less effective as transactional databases.

Document stores : MongoDB, Couchbase

  • A document stores are similar to key-value stores, except a value is single document in JSON, XML or BSON format with all data related to a specific key. example shown below.
  • Important fields in the document can be indexed to provide fast retrieval without knowing the key.
  • Each document can have the same or a different structure.
{
	ID : 10000001,
	Product : "android phone",
	Brand : "Samsung",
	Price : 8000,
	UOM : "ea",
    rating: "5/5",
    reviews: [
        {
          name: "Naveen", 
          review: "Not sure if this is the latest android version phone?"
        ?}
    ]
}

Wide-column stores : Apache Cassandra, Apache HBase

  • The key-value pair is contained within a column
  • Columns can be grouped into Column Families, which are related to standard relational tables. But names and formats of columns can vary from row to row across the table.
  • Column Families are together grouped into Super Column Families.
  • Wide-column stores provide high performance and can be easily scaled.

Graph stores : Neo4J, Infinite Graph

  • Use graph structures to store, map and query relationships.
  • It consists of nodes - shown as circles, which represent the things in a database and edges - shown as arrows, which represent relationship between things.
  • Edges provide index free adjacency, so that adjacent elements are linked together without using index.


Comparison of Relational to NoSQL Databases

Below table provides high level comparison of relational to NoSQL database.


Use Cases/Examples of NoSQL Databases

Below shown are some of the use cases where different NoSQL databases can be used, though it is not right to conclude in a single line about a database it's best fit use cases. The below table should give a direction towards what kind of applications can be built using different NoSQL databases.

Conclusion

Though in recent times NoSQL is gaining lot of popularity across different various businesses, Relational Databases still didn't lost their existence and still plays vital role in data persistence and management.

Both relational and NoSQL systems can co-exist and compliment each other in solving a bigger business problem.

A given use case can be implemented with both Relational and NoSQL systems, but complexity and time to implement will vary.

All these various systems are set of tools available for an architect/designer/developer and should be picked based on the analysis done by considering various parameters such as problem statement, different use cases involved, complexity, volume and velocity of the data being operated and skill sets available within the organisation or team.

Pick the databases which suits best for your environment/organisation after careful analysis, instead of just following someone blindly or going with buzz words...


Adhithya K.R

Writing and Tech | NIT Calicut

6 年

Really informative intro to databases. Helped a lot. Thank you

回复
Antony S.

Scrum Master - CSM

7 年

Very informative Naveen Endrapu

回复

要查看或添加评论,请登录

Naveen Endrapu的更多文章

社区洞察