登录查看更多内容

Quick Intro to Time series databases (TSDBs)

Ashutosh Maheshwari

发布日期: 2023年3月25日

Companies like Amazon, Uber, and Facebook use time series databases (TSDBs) to store and analyze large volumes of time-series data. But what exactly are TSDBs, and what use-cases are they best suited for?

For instance,

Amazon: Amazon uses a TSDB called Timestream to store and analyze data from IoT sensors in its warehouses. Timestream enables Amazon to track environmental factors such as temperature, humidity, and other metrics to optimize warehouse operations and reduce energy usage.
Uber: Uber uses a TSDB called M3 to store and analyze real-time data from its ride-hailing platform. M3 enables Uber to track the location of drivers and riders, analyze traffic patterns, and optimize its system to reduce wait times for riders and improve overall efficiency.
Facebook: Facebook uses a TSDB called Gorilla to store and analyze metrics related to its user base, such as the number of daily active users, page views, and ad impressions.

What is Time Series Data ?

In simple terms, data that is generated over time at regular intervals.

You can think of it as a collection of observations that are recorded in chronological order. Each observation is associated with a specific timestamp or time period, such as seconds, minutes, hours, days, weeks etc.

Internals in brief

TSDBs have two primary components: a storage engine and a query engine.

It uses combination of indexing, compression, caching, and aggregation techniques to efficiently store and query time-series data, while also providing scalability and fault-tolerance through sharding and replication.

For example, TSDBs may use an index to quickly locate the data for a specific time range or use pre-computed aggregates to reduce the amount of data that needs to be processed for a query.

Popular Use Cases

TSDBs can be used to store and analyse :

领英推荐

Faster AI, Lower Latency with Iceberg Databases

Vincent Granville 9 个月前

Data Warehousing is Dead

Vincent Rainardi 4 个月前

Don't Let Data Hold You Back: Understanding AI-Ready…

Huawei IT Products & Solutions 6 个月前

Financial data: tracking stock prices, exchange rates, and other financial metrics over time.
IoT sensor data: such as temperature, motion, humidity etc.
Application performance metrics: performance data such as response times, resource usage, and error rates.
Log data: such as server logs, application logs, and other types of log data that are generated over time.

Popular OpenSource TSDBs that you can explore ??

InfluxDB: InfluxDB is optimized for high write throughput and efficient querying.
TimescaleDB: TimescaleDB is built on top of PostgreSQL, providing SQL-based querying and horizontal scaling capabilities.
OpenTSDB: OpenTSDB is built on top of HBase, it is also designed to scale horizontally. (by adding more nodes to the cluster)
Prometheus: Prometheus is optimized for monitoring and alerting, providing a powerful query language and integration with other monitoring tools.

These DBs uses a query language that is specifically designed for time-series data, such as PromQL, InfluxQL, or OpenTSDB's Query Language. These query languages support time-based filtering and aggregation operations.

TSDBs are used along with other big data technologies, such as Apache Kafka and Apache Spark, to handle large volumes of time-series data in real-time.

References and Good Reads

Gorilla: A Fast, Scalable, In-Memory Time Series Database (Research Paper published by Facebook) : https://www.vldb.org/pvldb/vol8/p1816-teller.pdf
Amazon Timestream: https://aws.amazon.com/timestream/
Uber M3: https://eng.uber.com/m3/
InfluxDB: https://www.influxdata.com/products/influxdb/
OpenTSDB: https://opentsdb.net/
Prometheus: https://prometheus.io/
TimescaleDB: https://www.timescale.com/

_______________________________________________________________

If you found this article informative, don't forget to leave a like and subscribe to the newsletter for more snackable system design concepts. The newsletter has already reached 5.1k+ subscribers!

Feel free to connect with me here : https://linktr.ee/asmah98

System Design Simplified

7,470 位关注者

要查看或添加评论，请登录

Ashutosh Maheshwari的更多文章

Why “Trusting the Client” is a Design Mistake ?

2025年3月15日

Why “Trusting the Client” is a Design Mistake ?

One of the most dangerous assumptions engineers can make is trusting the client. A client—whether it's a web browser…

1 条评论
TF-IDF: How Machines Understand What Matters in Text ?

2025年2月10日

TF-IDF: How Machines Understand What Matters in Text ?

TF-IDF (Term Frequency-Inverse Document Frequency) is the secret sauce behind Search Engines and foundational NLP…

2 条评论
Vector Clocks: The Simple Way to Keep Distributed Systems in Sync

2023年7月1日

Vector Clocks: The Simple Way to Keep Distributed Systems in Sync

In a distributed system, it is necessary to track the order of events that occur across different processes. This can…

2 条评论
Git's Delta Compression Algorithm: Technical Deep Dive

2023年6月29日

Git's Delta Compression Algorithm: Technical Deep Dive

As of today there are over 300 million remote git repositories and the number is constantly growing. Given the high…
Netflix's Chaos Monkey : How Netflix Makes Systems More Resilient

2023年5月21日

Netflix's Chaos Monkey : How Netflix Makes Systems More Resilient

Netflix is a pioneer in the field of Chaos Engineering and have been using Chaos Engineering to improve the resiliency…

7 条评论
System Design: Busting 6 Myths

2023年4月8日

System Design: Busting 6 Myths

As software engineers, we are constantly striving to design better systems that meet the needs of our users. However…

2 条评论
Understanding Distributed Transactions

2023年3月10日

Understanding Distributed Transactions

Welcome to Article #4 of the Newsletter ?? Today we take a look at yet another interesting topic: "Distributed…

1 条评论
Ensuring Consistency in Distributed Systems: The Role of Consensus Algorithms ?

2023年2月25日

Ensuring Consistency in Distributed Systems: The Role of Consensus Algorithms ?

# Consensus algorithms helps ensure that all nodes in a distributed system agree on the same state. # They allow…
Circuit Breaker Pattern in Distributed Systems

2023年2月22日

Circuit Breaker Pattern in Distributed Systems

Ever wondered how popular food delivery apps (like Swiggy, Zomato, etc), manage to ensure seamless food delivery during…

3 条评论
Video Encoding at Scale of Netflix

2023年2月19日

Video Encoding at Scale of Netflix

Ever Wondered how Netflix is able to achieve Scalable Video Encoding? ?? Netflix has built a video encoding pipeline…

See all articles

Quick Intro to Time series databases (TSDBs)

Ashutosh Maheshwari

What is Time Series Data ?

Internals in brief

Popular Use Cases

领英推荐

Popular OpenSource TSDBs that you can explore ??

References and Good Reads

System Design Simplified

7,470 位关注者

Ashutosh Maheshwari的更多文章

社区洞察

其他会员也浏览了

Thanks a Trillion! Hyperscale Data Warehousing Takes Flight

The Dawn of the AI-Native Data Stack - Part 1

Data Center Download

OpenAI acquires Rockset - a tribute to my friends at Rockset, coupled with personal insights on data processing strategies

Latest Microsoft Fabric updates that can help you in 2025.

The Future of Big Data and AI: How Databricks is Leading the Transformation

January 01, 2025

Databricks Feature Set in 2024 - A year in wrap up

DATA Pill #022 - What have Google, META and others been doing all summer?

Unlocking the Power of Data with Databricks: A Must-Have for Your Product Roadmap

What is Time Series Data ?

Internals in brief

Popular Use Cases

领英推荐

Popular OpenSource TSDBs that you can explore ??

References and Good Reads

System Design Simplified

7,470 位关注者

Ashutosh Maheshwari的更多文章

Why “Trusting the Client” is a Design Mistake ?

TF-IDF: How Machines Understand What Matters in Text ?

Vector Clocks: The Simple Way to Keep Distributed Systems in Sync

Git's Delta Compression Algorithm: Technical Deep Dive

Netflix's Chaos Monkey : How Netflix Makes Systems More Resilient

System Design: Busting 6 Myths

Understanding Distributed Transactions

Ensuring Consistency in Distributed Systems: The Role of Consensus Algorithms ?

Circuit Breaker Pattern in Distributed Systems

Video Encoding at Scale of Netflix

社区洞察

其他会员也浏览了

Thanks a Trillion! Hyperscale Data Warehousing Takes Flight

The Dawn of the AI-Native Data Stack - Part 1

Data Center Download

OpenAI acquires Rockset - a tribute to my friends at Rockset, coupled with personal insights on data processing strategies

Latest Microsoft Fabric updates that can help you in 2025.

The Future of Big Data and AI: How Databricks is Leading the Transformation

January 01, 2025

Databricks Feature Set in 2024 - A year in wrap up

DATA Pill #022 - What have Google, META and others been doing all summer?

Unlocking the Power of Data with Databricks: A Must-Have for Your Product Roadmap