Demystifying DATA?: DBMS, Databases, Data Structures, Database Engines and Data

Demystifying DATA: DBMS, Databases, Data Structures, Database Engines and Data

Get Speedb OSS Code | Star us on Github:?https://lnkd.in/dgc78wsM

The management of data has become a critical aspect in all industries. Obvious examples like fintech, biogenetics, e-commerce, blockchain, AI, IoT, ML and many more are driving this growth and apps are driving the need for massive processing of that data.?

But with so many different terms and technologies that contain the word DATA, it can be challenging to discuss the details with our technical teams. In this article, I'll break down the most common terms and technologies associated with data management and explain the differences between them.

I could expect a bit of push-back , as there are actually some variations in definitions and overlapping aspects of these concepts depending on your technical level and even what part of the world you’re from.

DBMS and RDBMS

A Database Management System (DBMS) and Relational Database Management System (RDBMS) are software systems used to manage the higher level functions and complex aspects that serve the enterprise use-case of availability, performance, stability and resilience. DBMS typically includes a set of tools and features that allow users to create, access, modify, and delete data in a structured format, sending instructions to the lower layers.

It is important to note that a database and a DBMS are not the same things - a database is a collection of data, while a DBMS is the software system used to manage that data including whatever tools are useful, including connecting various layers, running multiple databases, and even security aspects.?

For the unstructured data cowboys, DBMS and RDBMS are essentially the Kubernetes of databases. (Don't hate on me for saying this, it's kinda true.. :-)

DBMS manage databases and toolsets, and RDBMS more specifically manages relational databases and toolsets, the terms are often interchanged. ?

Common examples of (R)DBMS:

The choice of the most popular DBMS (Database Management System) can vary depending on the context, industry, and region. However, some of the most popular DBMSs in recent years include:

  • MySQL
  • Microsoft SQL Server
  • Oracle Database

Databases

A database is a structured set of data that is organized in a specific way, leveraging a variation of a data structure. The data must be stored in one or various data structures in memory and/or on persistent disk and is stored in data structures like tables, graphs, and trees, depending on the requirements.

Databases typically include a set of rules or schema that define the structure and relationships between the data elements, or a key value store, using pairs of keys and values. It is important to note that a database and a data structure are not the same things - a database is a collection of data, while a data structure is a way of organizing and storing data in a computer program or application.

Examples of databases:

  • MongoDB
  • Cassandra
  • ArangoDB

Data Structures

A data structure is a very specific way of organizing and storing data in a application memory, and/or on physical media. A data structure typically includes a set of rules or algorithms that define how the data is stored, accessed, and manipulated. Data structures can be categorized into two main types: linear and nonlinear. Linear optimizes for applications that prefer linked orders, and nonlinear are for larger random searches where metadata is typically stored in memory instead of actual data.

Examples of data structures:

  • Arrays: A linear data structure that stores a fixed-size sequence of elements of the same data type. These are traditionally used in structured databases and are very inflexible, i.e structured databases.
  • Linked Lists: A non-linear data structure that consists of a sequence of nodes, each of which contains a reference to the next node in the sequence.
  • Trees: A non-linear data structure that represents a hierarchical structure with a root node and one or more child nodes. Trees are based on files and therefore are more flexible than Arrays and much more efficient than Linked Lists for larger random data sets, typically key value stores.

Database Engines (aka Storage Engines)

A database engine is a component of a database management system that is responsible for managing the storage of data in application memory, and mapping that data back and forth between physical persistent disk. Database engines typically provide features such as indexing, data compression, memory management and aligning memory with the physical disk layer.

Examples of database (storage) engines:

  • LevelDB: A fast key-value storage engine that is used in various applications, such as Bigtable, Bitcoin and Google Chrome.
  • RocksDB: A high-performance storage engine that is based on LevelDB and optimized for flash storage with advances and wide adoption.
  • Speedb: A more modern, high-performance storage engine that is designed to optimize computing resources via an extremely advanced engine leveraging APIs provided by RocksDB. The results are much lower latency, higher throughputs, and an advanced compaction technology which drops write amplification by up to 80%. Speedb has typically been described by the community as the natural evolution of LevelDB and RocksDB.

The Big O Benchmark - Constant Time is Gold

Unique to Hash Table data structures, is that they run in 'constant time' or in Big O format, O(1). Running in constant time (LevelDB, RocksDB, Speedb) means that for each IO operation requested by the application, all data blocks containing requested data can be returned in a single operation. It's not possible to do better than this, even in theory.

Big O doesn't predict your actual real-world performance, but just gives you a theoretical performance potential, assuming the worst-case-scenario for your IO or algorithmic operation. So, for example in a Linked List, this would assume the block you are accessing on disk is at the very end of the seek. So, if in the 'worst case', being able to return any volume of data blocks in a single operation, is an amazing result. You can compare this to other types of data structures, none of which can compete here.

Running in constant time gets you, at the least the maximum potential to achieve near perfect optimization through ongoing software innovations.

Check out the bigocheatsheet.com to investigate various algorithmic efficiencies and see why the Hash Table data structures are the new 6000 pound gorilla in a world of 600 pound gorillas.

No alt text provided for this image
bigocheatsheet.com

The problems we face with massive data growth get amplified as metadata size explodes and it gets harder and harder to keep pointers to data in memory. Modern storage engines are enabling new possibilities to avoid sharding which deliver stable, efficient performance at levels of hyperscale 10-100x from traditional systems.

Data Storage

Data storage typically refers to the physical or digital storage of data on a device, such as a hard drive, solid-state drive (SSD), or in memory. Data storage can be either persistent or volatile, depending on the type of device.

Examples of data storage:

  • Hard Disk Drive (HDD): A persistent data storage device that stores data on a rotating magnetic disk.
  • Solid-State Drive (SSD): A persistent data storage device that stores data on flash memory, providing high performance and reliability.
  • Dynamic Random Access Memory (DRAM): A volatile data storage device that stores data in memory chips and loses data when power is removed.

Popular storage array and data management vendors here are EMC, NetApp and VAST Data. By tuning or replacing the database engine, we can reduce read, write and space amplification before the data hits the external storage, therefore allowing physical storage appliances to perform hugely more efficiently.

And finally.. Just Plain Old Data.

Data is a fundamental concept in the world of enterprise infrastructure, referring to information in the form of bits, bytes and blocks, that can be processed and analyzed by computers and applications. At its most basic level, data is a series of electrical signals that are transmitted and stored as binary code - a sequence of ones and zeros that computers use to represent information.

These electrical signals are then aggregates and translated by applications, graphics and monitors into text, videos and pictures we can understand.

Data can come in many different forms is typically (eventually) stored on persistent storage devices such as hard disk drives or solid-state drives. In a typical enterprise infrastructure, data is usually organized and managed in databases and physical data storage systems

Unstructured data is roughly 80% of all future data growth, and so the landscape and infrastructure that must support that has a huge challenge ahead, with unstructured data sizes will 10x over the next 10 years, and roughly another 40% of the global population has yet to be exposed to the internet. It's wild to think that only now in 2023 is the data boom starting in earnest.

No alt text provided for this image
Statista Digital Economy Compass 2019

At?Speedb.dev?we have developed a highly evolved LSM based key value storage engine, designed to be much more scalable, performant, stable than the original iterations. Drop Speedb in as a library to your infrastructure in about 30 seconds and go. We provide robust OSS community and optional enterprise support to ensure our shared success.

If you're interested to learn more please reach out.

email = [email protected]

Get Speedb OSS Code | Star us on Github:?https://lnkd.in/dgc78wsM

要查看或添加评论,请登录

Bamiyan Gobets的更多文章

社区洞察

其他会员也浏览了