#bigdata 29e?—?NoSQL with Base, Cassandra and MongDB
José Antonio Ribeiro Neto
Author, Artificial Intelligence and Data Science Reseacher
Relational Database (RDBMS) is a technology used on a large scale in commercial systems, banking, flight reservations, or applications where data is structured. SQL (Structured Query Language) is the query language oriented to these applications.
Database applications, which stand out in the consistency of data schemas, can be scaled, but are not designed for infinite scaling.
The need to analyze data in large volumes, from different sources and formats, has given rise to NoSQL (Not Only SQL) technology. They are not relational and not based on schemas (rules governing data or objects).
In essence, all NoSQL implementations are looking for the scaled handling of large volumes of unstructured data.
NoSQL databases can grow endlessly and focus more on performance, allowing replication of data across multiple network nodes, reading, writing, and processing data at incredible speeds, using distributed parallel processing paradigms.
NoSQL can be used in real-time data analysis, such as personalization of sites from user behavior tracking, IoT (Internet of Things) such as vehicle telematics or mobile device telemetry.
NoSQL Types
The three main types of NoSQL are.
- Column Database (column-oriented)
- Key-Value Database (key / value oriented)
- Document Database (document-oriented)
1 — Column Database
A NoSQL database that stores data in tables and manages them by columns instead of rows is called the columnar database management system (CDBMS).
Columns are transformed into data files.
One of the benefits is that data can be compressed, allowing operations such as minimum, maximum, sum, counting, and averages to be executed quickly.
They can be auto-indexed, using less disk space than a relational database system containing the same data.
It is a NoSQL-oriented Columns. It is popular because it was built to run on top of Hadoop with HDFS.
It was designed from the concepts of the first columnar database developed by Google, called “BigTable.”
It is beneficial for real-time research, reading and accessing large volumes of data.
2 — Key-Value Database
A key/value oriented NoSQL stores data in collections of key/value pairs. For example, a student Id number may be the key, and the student’s name may be the value.
It is a dictionary, storing a value, such as an integer, and a string (JSON or Matrix file structure), along with the key to reference that value.
Cassandra is a powerful NoSQL based key/value model.
Facebook initially developed it in 2008, is hugely scalable and fault tolerant.
It was developed to solve Big Data analytical problems in real time involving Petabytes of data using MapReduce.
Cassandra can run without Hadoop, but it becomes powerful when connected to Hadoop and HDFS.
3 — Document Database (document-oriented)
Document-oriented NoSQL are similar to key/value documents.
They organize documents into collections analogous to relational tables, and research can be done based on values, not just key-based ones.
It is a document-oriented NoSQL, developed by MongoDB Inc., and distributed free of charge by the Apache Foundation.
MongoDB stores JSON document data as if it were a schema, meaning fields may vary from one document to another, and the data structure may change over time.
It can be run individually without Hadoop, but it becomes powerful when connected to Hadoop and HDFS.
CURIOSITIES
- Traditional companies such as Microsoft, IBM, Oracle, and Amazon, offer relational database products and SQL services and dominate the database commercial applications market.
- The best-known open-source relational database is MySQL.
- Relational databases have advantages in two aspects: Schemas that allow the control and validation of data and Relationships that allow the connections between the different tables.
- NoSQL allows relationships by nesting documents. For example, a parent document could have a child document nested directly to it.
- Many NoSQL query engines natively support the ability to perform queries and associations based on complex, nested documents.
More information about this article
Article selected from the eBook “Big Data for Executives and Market Professionals.”
eBook in English: Amazon or Apple Store
eBook in Portuguese: Amazon or Apple Store