Inner processing of SQL Queries in My Sql Database
?? Saral Saxena ??????
?11K+ Followers | Linkedin Top Voice || Associate Director || 15+ Years in Java, Microservices, Kafka, Spring Boot, Cloud Technologies (AWS, GCP) | Agile , K8s ,DevOps & CI/CD Expert
SQL queries written by engineers and data scientists are usually sent to MySQL as a plain string. How does MySQL interpret this string, and know which table to look for and which rows to fetch?
Connection pool
Just like when you are browsing this page, the web browser (chrome, safari) keeps a connection to Medium. Similarly, Our application server needs to be connected to the MySQL server to send SQL query strings. Connection pool is commonly used to manage internet connections.
Connection pool allows reuse of existing internet connections, avoiding startup & cleanup overhead costs in creating new connections. In addition, user auth can also be built into this layer, rejecting unauthorized database access.
Each connection usually maps to one thread. When processing a SQL query request, a thread in application server will take a connection out of the pool, sending a request to MySQL server. Another thread in MySQL server will receive the request in the SQL string format, and perform subsequent steps.
SQL parser
MySQL server needs to understand what the query is trying to do. Is it trying to read some data, update data, or delete data?
Once a query is received, it first needs to to be parsed, which involves translating it from what is essentially a textual format into a combination of internal binary structures that can be easily manipulated by the optimizer.
Query optimizer
Before MySQL executes the query, it determines how to fulfill the query, i.e. what is the best approach.
Let’s look at a simple SQL query:
SELECT name FROM employee_table WHERE employee_id = 1;
Assume the employee_table has 10k employee records. There are at least two approaches (or two plans using formal terminology):
Approach 2 is almost always faster. And optimizer decides to use approach 2. The next step is actually execute the plan.
Execution engine
Execution engine will call APIs of the Storage Engine to execute the plan determined by query optimizer.
Storage engine
Most software systems can be divided into a computation layer and a storage layer. The efficiency of the computation layer largely depends on how data is organized in the storage layer. In this section, let’s deep dive into the storage engine of MySQL, and many optimizations that were carefully designed to speed up read/write.
MySQL can integrate with many different storage engines. Each storage engine has its own set of pros and cons for different use cases. In other words, storage engine can be considered as an interface, and can have different underlying implementations. For example, there are InnoDB, MyISAM, Memory, CSV, Archive, Merge, Blackhole.
InnoDB is certainly the most widely used. It is the default ever since MySQL 5.5 version.
Just like our laptop, InnoDB stores data on both memory and disk. High level-wise, when writing to InnoDB, data always gets cached in memory first, and at a later time, persisted to disk.
InnoDB divides memory into two components:
Buffer pool is very important for InnoDB. MySQL usually is very fast in processing queries, and the reason is that the data is actually stored and served from memory (in most cases NOT from disk, contrary to what many people may think). This memory component is the buffer pool.
Buffer pool
In general, 80% memory of the host machine is allocated to Buffer pool. Larger memory means more data is stored in memory, thus faster reads
Besides simply put data in memory, Buffer Pool is also carefully designed in terms of how data is organized in memory to speed up DB reads and writes. Let’s further dive into this detailed design.
Page
Similar to how books are organized nicely with IDs and placed in alphabetical or numerical order on the library shelves, buffer pool organizes data in a similarly ordered approach.
InnoDB divides buffer pool into many?pages, along with a?change buffer?(will be illustrated later). All pages are linked as a doubly linked list, that is, you can easily get to the next page from the current page, or go backwards.
Now, how are data stored in a page?
User Records
领英推荐
Within a page, there are:
Pointers to the previous and next pages are, simply, pointers. User Records is the location where each “row” of data is stored. Each row has a next pointer to the next row, forming a singly linked list.
We know that a row in a SQL database usually consists of a primary key, and many other fields. The rows are usually ordered by primary key so that algorithms like binary search, etc can be used to speed up read latency when looking up a row with a particular key. However, if every time when we add a new row to User Records, and InnoDB needs to rearrange all the records (rows) to maintain order, it will be very slow.
In reality, the rows in the User Records are arranged by their insertion order. Adding a new record simply means appending to the end of the User Records. The desired order by primary key is achieved with the “next” pointers. The next pointer of each row points to the next logical row according to the primary key order, rather than physically the next row on the memory.
Now comes another question. We mentioned earlier that there is no need to traverse all the rows to find the target row with a specific primary key. However, if all rows are essentially a singly linked list, the property of singly linked list dictates that we can only find a specific row by traversing the entire list. And we know list traversing a list is O(n) time and very slow. How does InnoDB make it faster then? Remember the “other fields” we mentioned earlier? Bingo, they are exactly used for this purpose.
Infimum and Supremum
The two fields represents the largest and smallest row in a page, respectively. In other words, the two form a min-max filter. By checking these two fields in O(1) time, InnoDB can decide whether the row to look for is stored in this particular page.
For example, say our primary key is numerical, and we have supremum = 1, infimum = 99 for a particular page. If we are trying to look for a row with primary key = 101, then clearly in O(1) time, InnoDB decides that it is not in this page, and will go to another page.
Now, if the target row is not in the page, InnoDB skips the page. However, if the row is actually in the page, does InnoDB still traverse the entire list? The answer is no, and the “other fields” again come handy to help.
Page directory
As the name suggests, it is like the “table of contents” of a book.
Page directory stores a pointer to a block of rows. Say we have row 1 to n stored in User Records, page directory will have pointers to row1, row7, row 13 …, with a block size of 6. This design is very similar to?Skip List.
As you can probably imagine now, InnoDB first checks the page directory, and quickly determines which block to look into. Then utilizes the pointers to jump to the corresponding row in the singly linked list and start traversing from there. In this way, InnoDB avoids traversing the entire list, and needs to traverse a much smaller sublist.
The block we mentioned above is officially called?Slot.?Page directory?has multiple slots, and each slot has a pointer to a row in the User Records.
Every time when a new row is inserted to User Records, InnoDB updates the Page Directory as well to make the two consistent with each other. InnoDB creates a slot for every 6 rows.
Index
InnoDB uses B+ tree (read as “B plus tree”) for data storage. MySQL supports two types of index: primary index and secondary index. Once we understand the concept of page, index becomes obvious.
In terms of why B+ tree rather than B-tree (B tree, not B minus tree) is used:
All those nice designs with singly linked User Records, infimum-supremum and page directory are mainly used to speed up read. What happens when we want to update a row? Let’s now take a look in how MySQL handle data updates.
Update data
When we insert a row, say id = 100, into MySQL, and then if immediately we need to update this row, it usually still stays in Buffer Pool.
However, if we wait for quite a while, and then update this row, it is possible that this row id = 100 is no longer in Buffer Pool.
The reason is that the memory is limited, we cannot keep inserting rows to memory, which gets full very quickly. A purge policy, usually a specified as ttl(time-to-live) is used to clean up memory space. For example, Redis is usually used as a cache, ttl in Redis means keys get deleted (marked to be deleted and would not appear read, and then a background process actually deletes them in batch). In the case of MySQL, data need to be persisted. Purge in MySQL means persists to disk and removes from memory.
So, when trying to update a row, and if this row is not in Buffer Pool, InnoDB will load the data back to Buffer Pool from disk. The catch here is that, InnoDB cannot just load a single row with id=100, it needs to load the entire?page?containing this row. When the entire page is loaded to Buffer Pool, then specific rows can be updated.
So far so good. But this only describes updates to the primary index, what about secondary indexes?
Change Buffer
Let’s say we updated some fields of the row id = 100. If a secondary index is built on one of the field, then the secondary index needs to be updated at the same time. However, if the page containing the secondary index is not in the Buffer Pool, then does InnoDB loads the page to Buffer Pool as well?
If the secondary index is not going to be used later, but every time eagerly gets loaded to Buffer Pool whenever a related field is updated, there will be a lot of random disk I/O operations, which will inevitably slow down InnoDB.
This is why Change Buffer is designed to “lazily” update secondary indexes.
When a secondary index needs to be updated, and the page containing it is not in the Buffer Pool, the update will be temporarily stored in Change Buffer. At a later time, if the page gets loaded into Buffer Pool (due to updates to primary index), InnoDB will merge the temporary changes in Change Buffer to pages in Buffer Pool.
There is a drawback to this Change Buffer design though. If the Change Buffer has accumulated a large amount of temporary changes, and say we want to merge it all at once with Buffer Pool, it may take hours to complete!! And during this merging, there will be a lot of disk I/O, which occupies CPU cycles, and affects the overall performance of MySQL.
So some tradeoff are further made. We earlier mentioned that merge is triggered when the page containing the secondary index gets loaded back to Buffer Pool. In InnoDB design, the merge can also be triggered by other events to avoid large merges that may take a few hours. These events include transactions, server shut-down, and sever restart.
Adaptive Hash Index
Adaptive Hash Index is designed to work together with Buffer Pool. The adaptive hash index enables?InnoDB?to perform more like an in-memory database. The adaptive hash index is enabled by the?innodb_adaptive_hash_index?variable, or turned off at server startup by?--skip-innodb-adaptive-hash-index.