DBMS Landscape in 2020 - part 2
Since my last article couple of days back, people have commented on several missing parts. Here are a few questions to answer.
- Where is TimeSeries DB, specially as IoT looms large along with edge computing?
- What database is ideal for video and audio processing?
- Which one is best for real-time streams and data analytics?
- Where does the KV (keyvalue) store play?
- Is there a common platform to host multiple data models?
- When should we look at in-memory database?
- What about Blockchain that has a distributed database at the center for peer-to-peer ledger processing?
As I said before this is a complex area and would take a much longer discussion. But briefly let me mention how Microsoft and Amazon are addressing these areas.
AWS offers multiple database choices - Aurora (Transactional Apps), DynamoDB (Internet-Scale Apps), Redshift (DW for Analytics), ElastiCache (In-Memory for real-time apps), Neptune (Graph Database), DocumentDB (Semi-structured data), Timestream (Time-Series data), Quantum Ledger DB (Blockchain apps), and migration services to lure legacy database users.
Aurora is a relational DBMS, compatible with MySQL & PostgreSQL. It combines the performance/availability of high-end commercial DBMS with the simplicity/cost effectiveness of open source DBMS and claims speeds 5x faster than MySQL & 10x faster than PostgresSQL It also claims one-tenth the cost of commercial RDBMS. It is fully automated for server provisioning, patching, setup, config. and backups (less DBA burden). It enables “lift and shift†apps to the cloud. One of its customers is Airbnb which uses DynamoDB for kv store, ElastiCache for in-memory store, and AWS RDS for transaction processing.
Microsoft Azure offers a variety of database options - Azure SQL (mission-critical apps, this is the SQL Server in cloud), Azure DocumentDB (NoSQL database for fast, high availability, elastic scaling and global distribution), Azure SQL Data Warehouse (massively parallel processing, scale-out RDBMS for massive volume of data), Azure Redis Cache (in-memory apps), etc. One serious attempt to create a single platform that consolidates multiple data, API, and consistency models is Microsoft’s Cosmos DB.
Cosmos DB is a distributed database system that offers uniquely configurable options across consistency models, data models, and APIs. Cosmos DB supports multiple database modalities, including key-value, tabular, graph, and a MongoDB-compatible document model. However, to date, Cosmos DB has gathered only moderate mindshare, and Microsoft has revealed no road map that would allow it to bring SQL Server workloads within the Cosmos DB umbrella.
There are small-footprint databases geared for edge computing such as TimeSeriesDB (a layer on top of PostgreSQL). There are many other players like Riak, Redis, CouchDB, Aerospike, VoltDB, MarkLogic, FoundationDB, FlockDB, AllegroGraph, InfiniteGraph, HBase, StreamBase, etc. Oracle and IBM have offerings for NoSQL, time-series, and in-memory.
Welcome to the confusing task of picking the right DBMS. I have used the term "Polyglot" to signify the existence of multiple database systems at every enterprise as the diverse needs grow (transaction, analytics, stream, time-series, video-search, image data, etc.).
Once again, the fundamental question to ask is who can satisfy the functionality, performance and scale demands at affordable cost.