登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Series: Introduction to Columnar Databases

Anuj Yadav

Technology Leader | Ixigo | Microsoft | MakeMyTrip | Expedia

发布日期: 2024年6月6日

+ 关注

1. Columnar Databases:

- A?columnar database?organizes data in a column-wise format rather than the traditional row-wise format used by row-based databases.

- Key points about columnar databases:

- Data is logically organized as a table with rows and columns but physically stored in a column-wise format.

- Since columns typically contain similar data types, values, or properties, compression algorithms can achieve higher compression ratios compared to row-oriented databases.

- Benefits for Analytics:

- Query Performance: Columnar databases excel in analytical queries due to efficient column-wise processing, resulting in up to 10x query performance gains over row-oriented storage. Analytical queries often involve aggregating or analyzing a subset of columns rather than entire rows.

- Data Compression: Achieve up to 10x data compression over uncompressed data size.

2. Log-Structured Merge (LSM) Tree:

- Columnar databases use LSM Tree, which allows for fast writes using a write-ahead log.

- Write-Ahead Log (WAL):

- When a client writes data, the system:

1. Writes a log entry to disk (WAL).

2. Returns to the client.

- Benefits:

- Recovery: If an exception occurs (e.g., power loss), data can be recovered from the log.

- Background Compaction:

- Compaction reduces the number of files and eliminates garbage from updates and deletes.

- Happens independently from reads and writes.

- LSM Tree Efficiency:

- Parallel writes during compaction.

- Automatic replays.

- Efficient for high-throughput scenarios (e.g., event processing) due to background compaction.

3. Memtable:

- A?memtable?is an in-memory data structure used by columnar databases (such as?Cassandra?and?ScyllaDB).

- Purpose:

- Temporarily stores recent write operations (inserts, updates, and deletes) before they are committed to disk.

- Acts as a write buffer, allowing for quick write operations.

- Ensures eventual persistence to disk in an orderly manner.

- Process:

- When data is written, it is first stored in the memtable.

- Once the memtable is full or certain conditions are met (e.g., commit log size threshold), it is flushed to disk as an?SSTable?(Sorted String Table).

- After flushing, the memtable is cleared, and a new cycle begins.

- Key Points:

- Memtables are sorted by key (usually the partition key).

- Each column family typically has a separate memtable.

- The data in the memtable is eventually persisted to disk as SSTables for durability.

4. SSTable (Sorted String Table):

- An?SSTable?is a?persistent file format?used by databases like?ScyllaDB?and?Cassandra.

- Purpose:

- SSTables serve as the building blocks of the total data stored in the database.

- They are?immutable, meaning updates create new SSTable files instead of modifying existing ones.

- Structure:

- For each SSTable, the database creates an?index file?and a?data file.

- The?index file?helps locate data quickly within the sorted data file.

- Key Points:

- SSTables are efficient for read-heavy workloads due to their ordered and immutable nature.

- They play a crucial role in ensuring data durability and efficient data retrieval in distributed databases.

要查看或添加评论，请登录

Anuj Yadav的更多文章

What is Elasticsearch and why is it so fast?

2024年8月16日

What is Elasticsearch and why is it so fast?

Originally posted at https://anujyadav.substack.
I wrote some old JS code and NodeJS hydrated UI for the Simulation

2024年5月4日

I wrote some old JS code and NodeJS hydrated UI for the Simulation

I had a presentation today, I built a using NodeJS for that. It required me to write and backend.
Starting up? Multitasking - Good Or Bad

2015年12月13日

Starting up? Multitasking - Good Or Bad

I am a big multitasking fan myself. So, please read it out.

1 条评论
Practical - Agile software development in a complex project

2015年10月8日

Practical - Agile software development in a complex project

I have worked on some not so complex projects and some very complex projects. I found that it is quite easy to execute…
They are all differently abled

2015年8月10日

They are all differently abled

If you get the best team in the world no doubt you will be able to deliver. Actual leadership lies in making the…

2 条评论
Let them respect your work, not position - Part 2

2015年7月15日

Let them respect your work, not position - Part 2

In my last post I wanted to make some points about management style. Me being me did that in a abstract manner as I…
Let them respect your work, not position

2015年7月11日

Let them respect your work, not position

Let them respect your work, not position. Time and again you will come across something similar.

6 条评论
Is Your Engineer Working On Product Or Project?

2014年9月7日

Is Your Engineer Working On Product Or Project?

Why do we have two different words Project and Product? I know, all you smart people can give me lot of answers for…

1 条评论

See all articles

Anuj Yadav的更多文章

What is Elasticsearch and why is it so fast?

I wrote some old JS code and NodeJS hydrated UI for the Simulation

Starting up? Multitasking - Good Or Bad

Practical - Agile software development in a complex project

They are all differently abled

Let them respect your work, not position - Part 2

Let them respect your work, not position

Is Your Engineer Working On Product Or Project?

社区洞察