Postgres and MySQL, the main differences
Hussein Nasser
Software Engineer | Talks about backend, databases and operating systems
One of my udemy students asked a question about the difference between Postgres and MySQL. The answer turned out too long I thought I’ll make it into an article.
In a nutshell, the main difference between the two databases really boils down to the implementation of primary and secondary indexes, how data is stored and updated and the MVCC implementation.
Let us explore this further.
But First.. Fundamentals
An index is a data structure (B+Tree mostly) that allows searching for keys through layers of nodes which databases implement as pages. The tree traversal allows eliminating pages that don’t have the result and narrowing down pages that have it. This continues until a leaf page where the key resides is found.
Leaf nodes or pages contain a list of ordered keys and their values. When a key is found in a page, the page is cached in the database shared buffers with the hope that future queries may request keys in the same page. The point of finding the key is to get the corresponding value.
This last sentence is the fundamental understanding of all what database engineering, administration, programming and modeling is about. Knowing that your queries are hitting keys next to each other in a same page minimizes I/Os and increases performance.
Keys in B+Tree indexes are the column(s) on the table the index is created on, and the value is what databases implement differently. Let us explore what value is in Postgres vs MySQL.
MySQL
In a primary index, the value is the full row object with all the attributes*. This is why primary indexes are often referred to as clustered indexes or another term that I prefer index-organized table. This means the primary index is the table.
*Note that this is true for row-store, databases might use different storage model such as column-store, graphs or documents, fundamentally those could be potential values.
If you do a lookup for a key in the primary index you find 1) The page where the key lives, 2) the key itself and 3) the key's value which is the full row of that key, no more I/Os are necessary to get additional columns.
In a secondary index the key is whatever column(s) you indexed and the value is the primary key for that row. The value of secondary index leaf pages are usually primary keys. If you do a lookup on a secondary index, you find the secondary key, and the key's value is primary key for that row. To find the full row you need to scan the primary index using the primary key you found to get the full row.
So two index scans.
In MySQL all tables must have a primary index. If you don’t create a primary key in a MySQL table, one is created for you.
Example of where MySQL innoDB all tables must have a clustered primary index.
Postgres
Technically in Postgres there is no primary index, all indexes are secondary and all point to system managed tuple ids in data pages loaded in the heap.
The table data in the heap are unordered, unlike the clustered primary index leaf pages which are ordered. So if you insert rows 50–100 and all of them are in the same page, then later inserted rows 1–30, the new rows go directly after the last row (100) or even can go into a completely different page. While in a clustered primary index, inserts must go to a page that satisfies the key’s order. That is why Postgres tables are often referred to as “heap organized tables” instead of “index organized tables”.
Here is an example, notice how can use c_tid to get the tuple id which is combination of the page number and the index of tuple in that page.
sql
CREATE TABLE TEST (MYID INTEGER);
INSERT INTO test (myid) SELECT generate_series(50, 60);
INSERT 0 11
postgres=# select ctid, myid from test;
ctid | myid
--------+------
(0,1) | 50
(0,2) | 51
(0,3) | 52
(0,4) | 53
(0,5) | 54
(0,6) | 55
(0,7) | 56
(0,8) | 57
(0,9) | 58
(0,10) | 59
(0,11) | 60
(11 rows)
INSERT INTO test (myid) SELECT generate_series(1, 10);
INSERT 0 10
postgres=# select ctid, myid from test;
ctid | myid
--------+------
(0,1) | 50
(0,2) | 51
(0,3) | 52
(0,4) | 53
(0,5) | 54
(0,6) | 55
(0,7) | 56
(0,8) | 57
(0,9) | 58
(0,10) | 59
(0,11) | 60
(0,12) | 1
(0,13) | 2
(0,14) | 3
(0,15) | 4
(0,16) | 5
(0,17) | 6
(0,18) | 7
(0,19) | 8
(0,20) | 9
(0,21) | 10
(21 rows)
Updates and deletes in Postgres are actually inserts. Every update or delete creates a new tuple id and the old tuple id is kept for MVCC reasons. I’ll explore that later in the post. A row in Postgres can have 10 different tuple "versions".
The truth is the tid by it itself is not enough. Really we need both the tuple id and also the page number, this is referred to as c_tid. Think about it, it is not enough to just know the tuple id we need to know which page the tuple live. Something we didn’t have to do for MySQL because we are actually doing a lookup to find the page of the primary key. Whereas in Postgres we are simply doing an I/O to fetch the full row.
Example of where Postgres tables are heap organized and all indexes point to the tuple ids
Queries Cost
Take the following table for the examples below.
sql
#TABLE T;
#PRIMARY INDEX ON PK AND SECONDARY INDEX ON C2, NO INDEX ON C1
# C1 and C2 are text
# PK is integer
| PK | C1 | C2 |
|----|----|----|
| 1 | x1 | x2 |
| 2 | y1 | y2 |
| 3 | z1 | z1 |
Let us compare what happens in MySQL vs Postgres
sql
SELECT * FROM T WHERE C2 = 'x2';
That query in MySQL will cost us two B+Tree lookups*. We need first to lookup x2 using the secondary index to find x2's primary key which is 1, then do another lookup for 1 on the primary index to find the full row so we return all the attributes (hence the *).
*One might think this is simply two I/Os which isn’t true, a B+Tree lookup is a O(logN) and depending on the size of the tree it might result in many I/Os. While most of the I/Os might be logical (hitting cached pages in shared buffers) it is important to understand the difference.
In Postgres looking up any secondary index will only require one index lookup followed by a constant single I/O to the heap to fetch the page where the full row live. One B+Tree lookup is better than two lookups of course.
To make this example even more interesting, say if C2 is not-unique and there were multiple entries of x2, then we will find tons of tids (or PKs in MySQL) matching the x2. The problem is those row ids will be in different pages causing random reads. In MySQL it will cause Index lookups ( perhaps the planner may opt for an index scan vs a seek based on the volume of these keys) but both databases will result in many random I/Os.
Postgres attempts to minimize random reads by using bitmap index scans, grouping the results into pages instead of tuples and fetching the pages from the heap in fewer I/Os possible. Later additional filtering is applied to present the candidate rows.
领英推荐
Let us take a different query.
sql
SELECT * FROM T WHERE PK BETWEEN 1 AND 3;
I think MySQL is the winner here for range queries on primary key index*, with a single lookup we find the first key and we walk the B+Tree linked leaf pages to find the nearby keys as we walk we find the full rows.
Postgres struggles here I think, sure the secondary index lookup will do the same B+Tree walk on leaf pages and it will find the keys however it will only collect tids and pages. Its work is not finished. Postgres still need to do random reads on the heap to fetch the full rows, and those rows might be all over the heap and not nicely tucked together, especially if the rows were updated. Update heavy workload is Postgres’s enemy, pick a good FillFactor for your table.
*It is worth mentioning that MySQL range queries on secondary indexes might also suffer random reads, maybe even worse than Postgres. That is because the result of range queries on a secondary index are primary keys, those primary keys are not necessary in order, so we will also have random index scans on the primary index to get the full rows.
Ok let us do an update.
sql
UPDATE T SET C1 = ‘XX1’ WHERE PK = 1;
In MySQL updating a column that is not indexed will result in only updating the leaf page on the primary index where the row is with the new attribute value. No other secondary indexes need to be updated because all of them point to the primary key which didn’t change.
In Postgres updating a column that is not indexed generates a new tuple and might* require ALL secondary indexes to be updated with the new tuple id because they only know about the old tuple id. This causes many write I/Os. Uber didn’t like this one in particular back in 2016, one of their main reasons to switch to MySQL off of Postgres.
I said might here because in Postgres there is an optimization called HOT (heap only tuple) not to be confused with (Heap organized table) that keeps old tuple id in the secondary indexes but put a link on the heap page header that points old tuple to the new one.
Data types Matter
In MySQL choosing the primary key data type is critical, because the key will live in all secondary indexes and it can cause secondary indexes to get larger. For example, a UUID primary key will bloat all secondary indexes size causing more storage and read I/Os.
In Postgres, having the primary key as UUID won't affect secondary indexes. All indexes values point to the tuple id which is fixed to 4 bytes.
Undo logs
All modern databases support multi version concurrency control (MVCC). In simple read committed isolation level if a transaction tx1 updates a row and didn’t commit yet while another concurrent transaction tx2 wants to read that row it MUST read the old row not the updated one. Most databases (MySQL included) implement this feature using undo logs.
When a transaction makes a change to a row, the update is applied in-place to the row on the page in the shared buffer pool so the page where the row live always has the latest data. In addition to logging the change itself to the WAL or redo-log, the transaction also writes information of how to undo the latest changes to row (enough info to construct the old state) in an undo log, this way concurrent transactions that still need the old state based on their isolation level must crack the undo log and construct the old row.
You might wonder if writing uncommitted changes to the page is a good idea. What happens if a background process flushes the page to disk and then the database crashed before the transaction can commit? That is where the undo log is critical. Right after a crash, the uncommitted changes are undone using undo logs on database startup*.
One can’t deny the cost of undo logs for long running transactions on other running transactions. More I/Os will be required to construct the old states and there is a chance the undo log fills up causing transactions to fail.
In one case I have seen one database system takes over an hour to recover from a crash after running a 3 hour uncommitted long transaction. Yeah avoid long transactions at all costs.
Postgres does this differently, each update, insert and delete gets a new copy of the row with a new tuple id with hints about what transaction id created the tuple and what transaction id deleted the tuple. So Postgres can safely write the changes to the data pages and concurrent transactions can read the old or new tuples based on their transaction id. Clever design.
Of course no solution is without its problems. We actually talked about the cost of creating new tuple ids on secondary indexes. Plus Postgres need to purge old tuples that are no longer required if all running transactions ids are greater than the transaction that deleted the tuples. Vacuum takes care of that.
Purge vs Vacuum
Both MySQL and Postgres require a clean up process that removes dead and unwanted data. MySQL implements a process called “purge” while PostgreSQL uses “vacuum”.
Purge removes unused undo logs no longer needed either belonging to transactions that already committed or transactions that have rolled back (via a crash) and no longer need the undo log. Too many undo logs can slow down reads.
Postgres vacuum process (or really processes), on the other hand, is designed to reclaim storage occupied by “dead tuples”. As we discussed, when a row is updated or deleted in Postgres, the old version of the row is not immediately removed from disk; instead, it’s marked as dead. The vacuum process cleans up these dead tuples, freeing space for new data and preventing table bloat, but not necessary returning this space back to the operating system. Additionally, the vacuum updates the visibility map, which helps in query optimization, and optionally analyzes the database to update statistics for the query planner.
Processes vs Threads
MySQL uses threads, Postgres uses processes, there are pros and cons for both I covered that in details in its own post here.
Postgres Process Architecture
I like threads better than processes in database systems. Just because they are lighter weight and share their parent process virtual memory address.Processes come with the overhead of dedicated virtual memory and larger control block (PCB) compared to the smaller thread control block (TCB).
When we do a context switch between one process to another, we need to invalidate the TLB (translation look-aside buffer) which is responsible for caching the virtual memory to physical memory mapping, because the new process mappings are different then the current. So we flush the TLB, the core has to go back to the page table in memory and fetch those mappings which slows down lookups. Context switching threads can safely share the TLB because the virtual address is shared between it and the parent process so we are technically switching the same process (memory wise) but we just need to update the program counter, stack pointers and the registers, making switching threads faster.
If we are eventually going to share memory and deal with mutexes and semaphores anyway why not use threads. Just my two cents.
I discuss this in details on my fundamentals of operating systems course.
Storage Engine
MySQL was designed with the flexibility to extend its storage engine, with InnoDB as the default option. This allowed MySQL to support multiple storage engines, for example it could integrate with RocksDB through MyRocks. However, PostgreSQL operates with a single storage engine (heap storage), and adopting a different one would require forking and significantly rewriting the entire system, making such modifications far more challenging.
Summary
With that in mind you get to pick which database system is right for you. What really matters is breaking down your use cases and queries and understand what each database does and see what works and what doesn’t for you.
No wrong or right here.
If you found this useful consider grabbing my fundamentals of database engineering course.
Software Engineer | Talks about backend, databases and operating systems
2 个月Regardless what you pick, both are great choices. Just know your workload and understand how both databases work, so you can pick what fits your needs best.
Full Stack Engineer ( Go, Typescript, react, python)
2 个月And at the end of the day we are going to work with whatever the management wants it to be They heard MySQL is king so that why we gonna use it ????♂?
Software Engineer at EDSN
3 个月Once again a great piece! After the network course and the backend course, I will now start on your database course, can’t wait ??
Software Developer NodeJs
3 个月thanks for your sharing!
Senior Database Engineer at Google
3 个月Thanks for writing this great article. "system takes over an hour to recover from a crash after running a 3 hour uncommitted long transaction" Technically, crash recovery duration depends on checkpoint interval rather than the volume of uncommitted transactions. Uncommitted data can be rolled back once the database is back online after applying all the WAL from last checkpoint. PostgreSQL and Oracle follow this approach, although not sure if MySQL operates the same way.