Choose relational vs NoSQL database?

Relational or NoSQL?

You have this great software application in mind. Now, to implement it, what database should you use? Should it be relational or NoSQL?

Concepts

To make this decision, it is important to understand two software concepts.

CAP Theorem

With requirements of Consistency (reading the latest data), Availability (client with data to write must be able to hand over the data without worrying about retries & db should accept the data and handle reliably writing it) & Partition tolerance (tolerate nodes or partial internal network errors), it is possible to meet only two requirements in full with compromises required on the third.

3Vs

A concept in Big Data. What is the volume, variety & velocity of the incoming data?

Choosing Relational

When the data is structured well (lacks variety) & within reasonable limits of volume & velocity, a relational database is usually a good choice. Relational databases have a mature plethora of features & supporting products that NoSQL databases will struggle to match.

In a SQL database, reads (with the right indexes) are fast while writes are slow since they have to go to one location. A SQL database is designed to follow CA (Consistency {ACID-Atomicity, Consistency, Isolation, Durability} & Availability {with multiple servers}) at the cost of partition tolerance. Complex solutions can reduce the partition tolerance.

For a social network requirement, a Graph database is a good choice, which also follows CA (Consistency/Availability) at the cost of P (Partition Tolerance).

Choosing NoSQL

If even one of the 3Vs (volume, variety, velocity) is high, it is better to go with a NoSQL database.

Availability & Partition Tolerance

Cassandra is a NoSQL database server that can have multiple nodes. It will eventually need a DevOps team to manage complexity. Consistency is quick (but not immediate & guaranteed), dirty reads are a possibility & occasional bugs were reported in the past when a node goes down with dirty data & brought back up.

Amazon Dynamo: A commercial Storage as a service solution with DevOps managed by the provider.

Both Cassandra & Dynamo are column based key-value databases.

Consistency & Partition Tolerance

If consistency & partition tolerance are more important than availability, MongoDb & HBase are options.

MongoDb is a document based database with an id as a key & indexed documents. By default, it will keep 3 copies in sync & will not succeed a write unless all copies are written successfully. It will ensure the reads don't read dirty data.

HBase is a columnar database (similar to Cassandra & Dynamo) but ensures 3 copies by default (similar to MongoDb).

Availability can be improved with complex solutions on top.

NoSql database Categories

Key-Value

Memcached, Redis, Riak

Column-oriented

Cassandra

Document oriented (JSON)

MongoDb, CouchDb

Tabular

Big Table, HBase

Graph

Neo4J, ArangoDb (supports sharding with auto-rebalancing), MarkLogic

XML, Object based

eXist, BaseX, MarcLogic, ZODB

Conclusion

Depending on the requirements, there are multiple options available for someone wishing to build a software solution. The right tools have to be chosen for the right job.

Co-authored with Srinivasan Varadharajan.

要查看或添加评论,请登录

Swaminathan Saikumar的更多文章

  • Why is Go fast?

    Why is Go fast?

    Why is Go fast? Go has become popular for microprocesses & for scaling. What are the design decisions that make Go…

  • Cloud native architecture-an overview

    Cloud native architecture-an overview

    Any infrastructure has two main components: compute & storage. Software needs compute to run & storage to read/write.

  • I Bonds during high inflation

    I Bonds during high inflation

    During times of high inflation in the USA, consider the Series I Bonds issued by the US treasury. Currently, I bonds…

  • Microservices deployment

    Microservices deployment

    History of deployment options: Physical machines: 1990s. Fast deployment, best performance.

  • Microservices security & tracking

    Microservices security & tracking

    Security: AAA: Authentication, Authorization, Accounting/Auditing Secure interprocess communication (TLS) Security…

    1 条评论
  • Isolation & Locks

    Isolation & Locks

    The CAP theorem states that two out three of Consistency, Availability & Partition Tolerance may be achieved. RDBMS…

  • Messaging architecture

    Messaging architecture

    Message formats: Text, such as JSON/XML. Readable & easier for debugging.

  • Microservices API Gateway

    Microservices API Gateway

    Benefits: Instead of specific services, clients talk to the API gateway, which provides a client-specific API…

  • SQL, NoSQL or Hadoop for 'Big Data'?

    SQL, NoSQL or Hadoop for 'Big Data'?

    In an earlier post, we had looked at how to use the 3Vs (Volume, Velocity & Variety) of data & the CAP theorem…

  • Scaling applications-an overview

    Scaling applications-an overview

    As a business grows, its software applications will need to scale. Computing bottleneck Run multiple identical…

社区洞察

其他会员也浏览了