登录查看更多内容

What happens when databases crash?

Hussein Nasser

Software Engineer | Talks about backend, databases and operating systems

发布日期: 2024年12月13日

This can make an interesting interview question.

Databases have tables and indexes stored in files and cached in memory aka as buffer pool. As you create rows, the database system writes the rows to data pages on memory first which is then eventually written to data files on disk.

Reading a page always checks the memory buffer pool. If it is not there it pulls the page from disk and place it in the buffer pool.

There is a problem though, what happens if you lose power half-way through writing to the data files? Some pages would have been written while others didn’t. When the database starts back up, we end up with data loss, or worse corruption.

Meet the WAL

Database folks quickly realized that they need something that would help with crashes and power loss, and that is the WAL (Write-ahead log) or Redo log.

What if as we write to tables and indexes data pages we create a log entry in the WAL of those changes. We write the WAL to its own files and also write to the data pages in memory. It is OK not write to the actual table and index data files on disk, those can stay in memory, as long as we have a log we can always construct the table.

The WAL is much smaller than the actual data files, and so we can flush them relatively faster. They are also sequential unlike data pages which are random. Disks love sequential writes, which is another benefit of WAL.

The WAL can also be used for all sort of things like replications and backup and yes crash recovery.

What if we crashed while writing the WAL?

WAL entries are also written to memory first and then flushed later based on the transactions commit. That is why the transaction state is critical.

The database can crash after writing WAL entries, that is fine, as long we know the transaction state belonging to each WAL entry we can discard or omit uncommitted WAL entries upon recovery.

For example if you are in the middle of a transaction and the database crashed, we consider the transaction rolled-back by default. WAL entries flushed by this uncommitted transaction will be discarded upon recovery.

But if you were able to issue a COMMIT and the WAL entry for a transaction commit makes it to disk and the client gets a success, that transaction is considered committed even if we crashed right after. The database remain consistent in that case.

领英推荐

How to find duplicates in a table using SQL?

LearnSQL.com 2 年前

9 Recently Asked Google SQL Interview Questions from…

DataLemur ?? (Ace the SQL & Data Interview) 1 年前

Protect Your Data: Understanding and Preventing SQL…

Codingmart Technologies 8 个月前

Redoing the WAL

So we have established that WAL is the source of truth, we write the changes to data pages in memory of course (for on going transactions to use the latest and greatest) but we delay flushing the data pages to disk because the WAL made it to disk. The WAL is ahead of the data pages, thus the name, write ahead log.

Now we have data files on disk that are out of sync with what is in memory which is absolutely fine. As long as the database is running, we will only read from memory which has the latest. The problem is if the database crashes.

As the database starts back up, the file is out of date we can’t just pull it to memory and have clients read them, the WAL is the source of truth, we need to REDO the changes in the WAL back on the data files, and during that process nothing is allowed to read (otherwise we get corruption). The database startup time slows down the more out of sync the data files are from the WAL (many writes has happened but the data files were not updated for a long time).

Checkpointing

The question here can I control how often data pages are flushed to disk? And the answer is yes, this is called as checkpointing, where a new WAL record is created called checkpoint record right after the entire data pages in memory are flushed to disk.

Checkpointing however is an IO heavy operation, the data pages can be very large and very random.

Too much checkpointing sure makes start up faster, but it indeeds takes a toll on your I/O disk bandwidth which may take precious time from critical point reads or writes.

Some database systems have also undo logs, but that is for another post.

How often should we flush the WAL?

WAL contains changes from several transactions. If one of the transactions commit, we have to flush WAL to disk if we want to maintain durability. Of course this will flush other transactions data that may have not been committed which as we discussed is also fine, the transaction state saves us here.

But some databases expose certain configurations to relax how often the WAL is flushed. One of them being fsync, which ensures the contents of the WAL file is flushed to disk out of the OS page cache. Another is a transaction sibling commit delay, which waits for more transactions to commit so we incur fewer flushes.

Indeed database engineering is a fascinating world by itself. If you want to learn the fundamentals of database engineering check out my course at https://databases.win

Ali Ather

1 个月

Great explanation It was a new concept for me. Thanks for sharing

K. Ram Babu

Senior Software Engineer at TMEIC | Ex. Cognizant | Python | SQL | Django

2 个月

Thanks for sharing. Loved your explanation. ??

Devi Priya Ragireddy

2 个月

Good one.Its interesting to know the internals??

1 次回应

Abiodun Alao

Backend Engineer

2 个月

Enlightened

查看更多评论

要查看或添加评论，请登录

Hussein Nasser的更多文章

A Story about Lunch and cache invalidation

2025年2月14日

A Story about Lunch and cache invalidation

A construction project at work has been blocking the main doorway to the cafeteria where we get lunch. For the first…

11 条评论
The Six Connections Limit in Chromium Browsers

2025年2月3日

The Six Connections Limit in Chromium Browsers

A web application can be choked by Chrome’s HTTP/1.1 six connection per host limit.

8 条评论
The Beauty of the WAL - A deep dive

2025年1月29日

The Beauty of the WAL - A deep dive

In any database system there are often two major storage units, data and indexes. Data represents tables, collections…

11 条评论
What makes a good database engineer

2025年1月17日

What makes a good database engineer

The art of truly understanding database systems boils down to the following principles. You cannot do much with data on…

18 条评论
A TCP option that improves Frontends and Backends latency

2024年12月17日

A TCP option that improves Frontends and Backends latency

99% of network latency issues are caused by the user app logic. App logic here includes libraries and frameworks used…

34 条评论
What happens to a request before it’s processed?

2024年12月7日

What happens to a request before it’s processed?

When sending a request to a backend, we tend to focus on the processing aspect of the request, which is really the last…

5 条评论
How to Become a Good Backend Engineer (Fundamentals)

2024年12月3日

How to Become a Good Backend Engineer (Fundamentals)

I have been a backend engineer for over 20 years and I have witness technologies come and go. One thing however, always…

49 条评论
Postgres and MySQL, the main differences

2024年11月21日

Postgres and MySQL, the main differences

One of my udemy students asked a question about the difference between Postgres and MySQL. The answer turned out too…

11 条评论
Good code, Bad code

2024年11月17日

Good code, Bad code

Code is just code, until bugs appear. Then we label it “bad code”.

20 条评论
Avoid SELECT *, even on a single-column tables

2024年11月13日

Avoid SELECT *, even on a single-column tables

Try avoiding SELECT * even on single-column tables. Just keep that in mind even if you disagree.

19 条评论

See all articles

What happens when databases crash?

Hussein Nasser

Software Engineer | Talks about backend, databases and operating systems

Meet the WAL

What if we crashed while writing the WAL?

领英推荐

Redoing the WAL

Checkpointing

How often should we flush the WAL?

Hussein Nasser的更多文章

社区洞察

其他会员也浏览了

SQL Essential Concepts for Data Analyst Interviews

SQL Essential Concepts for Data Analyst Interviews

Exactly How SQL is Used on the Job by Data Scientists - A Case Study

A guide to becoming a Taruk Makto in SQL

Creating your first Data Science project

Oracle Database Ecosystem Blog

Data Science – Working with Data Interview Questions

Filtering with ON vs. WHERE

Healthcare SQL Data Analysis

Optimising SQL Queries: Fundamentals and Strategies

Meet the WAL

What if we crashed while writing the WAL?

领英推荐

Redoing the WAL

Checkpointing

How often should we flush the WAL?

Hussein Nasser的更多文章

A Story about Lunch and cache invalidation

The Six Connections Limit in Chromium Browsers

The Beauty of the WAL - A deep dive

What makes a good database engineer

A TCP option that improves Frontends and Backends latency

What happens to a request before it’s processed?

How to Become a Good Backend Engineer (Fundamentals)

Postgres and MySQL, the main differences

Good code, Bad code

Avoid SELECT *, even on a single-column tables

社区洞察

其他会员也浏览了

SQL Essential Concepts for Data Analyst Interviews

SQL Essential Concepts for Data Analyst Interviews

Exactly How SQL is Used on the Job by Data Scientists - A Case Study

A guide to becoming a Taruk Makto in SQL

Creating your first Data Science project

Oracle Database Ecosystem Blog

Data Science – Working with Data Interview Questions

Filtering with ON vs. WHERE

Healthcare SQL Data Analysis

Optimising SQL Queries: Fundamentals and Strategies