How Do SQL Database Engines Work?
To many programmers, SQL RDBMS is a magical black box. A document can help clear up a lot of the mystery. SQL developers take better advantage of the language once they have a better idea of what goes on under the hood.
SQL (Structured Query Language) is a language used to retrieve, insert or maintain data in databases (usually relational). The statements used to perform these data operations in this language are called SQL (Sequal) queries.
The Structures Query Language (SQL), itself is a “spec” — a generalized language that supports the basics. It allows managing large amounts of data without worry about the detailed work necessary to do inside the machine and to deal with its challenges.
Thus, you can Create, Update, Retrieve and Delete data assuring the data integrity and Controlling de user access with less stress.
SQL overview
SQL stands for Structured Query Language which could be referred to as either a programming or a Query language, the main purpose of SQL to interact with the relational database in which data store in tabular form. SQL can manage a large amount of data, especially if data is written simultaneously and we have many transitions over that data.
When the developer uses Structured Query Language (SQL), for data management, there the user gets the ability to perform Create, Retrieve, Update, and Delete data between database. There are various Relational Database Management systems(DBMS) such as MySQL, SQLite, Postgres SQL, Oracle, etc., and they all provided the same kind of features.
What is a SQL Engine?
A SQL engine is a software that interprets the SQL query and the contented SQL commands so the appropriate operations can be performed efficiently on a database that usually is a relational database. The objective of the SQL engine to create, read, update and delete (CRUD) data from a database.
A Database Management Systema or SQL engine or SQL engine has two components: A storage engine and a query processor. Nowadays, some SQL engines have more than one Storage engine. There are many types of SQL DBMS and they all are made with different architecture, but used to aim the same which includes CRUD operations on the database and extended 4GL and newest features.
How the SQL Database Engine Works?
SQL has many stages on which the process of query compilation and execution takes place. Every SQL database engine contains two main components Compiler and Virtual machine to execute the queries. The compiler read the query and convert that query to appropriate byte code, then that byte code evaluated by the virtual machine and a proper response given back to the client. The Complete Execution of a query is Categories into 4 main stages:
- Compiling (Parsing, Checks, and Semantics)
- Binding
- Optimizing
- Executing
About SQL Processing
SQL processing is the parsing, optimization, row source generation, and execution of a SQL statement. The database engine, depending on the statement, may omit some of these stages by caching technics. In the following figure we can see the stages for processing SQL sentence:
Source: Oracle Corp.
SQL Parsing
The first stage of SQL processing is parsing. This stage tokenizes a SQL statement that involves separating the pieces and put them into a data structure that other routines can process. The SQL engine parses a statement when is instructed by the application (Pay attention developer), which means that only the application-, and not the SQL engine itself, can reduce how many parses will be done.
The application makes a parse call to the database to prepare the SQL statement for execution. The parse call opens or creates a cursor (if it doesn't exist), which is a handle for the session-specific private SQL area that holds a parsed SQL statement and other processing information. The cursor and private SQL area reside in the program global area (PGA). The database engine performs the following checks, during the parse call:
- Syntax Check
- Semantic Check
- Shared Pool Check
The preceding checks identify the errors that can be found before statement execution. Some errors cannot be caught by parsing. For instance, the database engine can encounter deadlocks or errors in data conversion only during statement execution.
Syntax Check
The database engine must check each SQL statement for syntactic validity. A statement that breaks a rule about well-formed SQL syntax sentences fails the check. For example, the following statement fails because FORM isn't a keyword from SQL well-formed rules:
mysql> SELECT * FORM employees; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FORM employees' at line 1
Semantic Check
When you are speaking about meaning in a SQL statement you are speaking about semantic. Checking the semantic it is possible to find out if a sentence is meaningful, for instance, verify if columns, views, tables, synonyms, etc., exist.
A SQL sentence can be syntactically correct but It can fail the semantic check. Let's see an example:
SQL> SELECT * 2> FROM nonexistent_table; SELECT * FROM nonexistent_table * ERROR at line 1: ORA-00942: table or view does not exist
Shared Pool Check
During the parse, the database performs a shared pool check to determine if it can avoid resource-intensive steps of statement processing.
To this end, the SQL engine uses a hashing algorithm to generate a hash key for every SQL statement. The statement hash key is the SQL ID. This hash key is deterministic, so the same SQL statement in a single instance or in different instances has the same SQL ID.
When a developer or application sends a SQL statement, the database engine searches the shared SQL area to see whether an existing parsed statement has the same hash value.
Parse operations are classified according to the following categories, depending on the content type of SQL statement submitted and the result of the hash check:
- Hard parse
- Soft parse
SQL Optimization
In the optimization stage, the SQL engine must perform a hard parse at least once for every unique SQL DML statement and carry out the optimization during this parse.
The SQL engine never optimizes SQL DDL statements unless it includes a SQL DML statement with a component such as a subquery that requires optimization.
SQL Row Source Generation
The software named row source generator receives the optimized execution plan from the optimizer and creates an iterative execution plan that is usable by the rest of the SQL Engine.
The iterative execution plan is a binary program that, when executed by the SQL engine, produces the tuples (result set). The plan runs as a set of a sequence of steps. Each step returns a row set. The next one uses the rows in this set, or in another case, the last step returns the rows to the application issuing the SQL statement.
The row source can come from a table, view, or result of a join or grouping operation. The row source generator creates a tree from the row sources, which is a collection of different row sources. The row source tree is made of the following:
- An ordering of the tables referenced by the statement
- An access method for each table mentioned in the statement
- A join method for tables affected by join operations in the statement
- Data operations such as filter, sort, or aggregation
Execution plan example:
SELECT e.last_name, j.job_title, d.department_name FROM hr.employees e, hr.departments d, hr.jobs j WHERE e.department_id = d.department_id AND e.job_id = j.job_id AND e.last_name LIKE 'A%'; Execution Plan ---------------------------------------------------------- Plan hash value: 975837011 -------------------------------------------------------------------------------- | Id| Operation | Name |Rows|Bytes|Cost(%CPU)|Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 189 | 7(15)| 00:00:01 | |*1 | HASH JOIN | | 3 | 189 | 7(15)| 00:00:01 | |*2 | HASH JOIN | | 3 | 141 | 5(20)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 3 | 60 | 2 (0)| 00:00:01 | |*4 | INDEX RANGE SCAN | EMP_NAME_IX | 3 | | 1 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | JOBS | 19 | 513 | 2 (0)| 00:00:01 | | 6 | TABLE ACCESS FULL | DEPARTMENTS | 27 | 432 | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID") 2 - access("E"."JOB_ID"="J"."JOB_ID") 4 - access("E"."LAST_NAME" LIKE 'A%') filter("E"."LAST_NAME" LIKE 'A%')
SQL Execution
During execution, the SQL engine executes each row source in the tree produced by the row source generator. This step is the only mandatory step in DML processing.
The figure below is an execution tree, also called a parse tree, that shows the flow of row sources from one step to another in the plan in the above Example. The order of the steps in execution is, in general, the reverse of the order in the plan, so you should read the plan from the bottom up.
Each step in an execution plan has its own ID number. The numbers in the next figure correspond to the Id column in the plan shown in the above example. Indentation in the Operation column of the plan means hierarchical relationships. For instance, if there are two spaces before the name of an operation, then this operation is a child of an operation that is preceded by only one space. The operations that are preceded by one space are children of the SQL SELECT statement itself.
The previous figure shows the row source as each node of the tree which means that each step of the SQL execution plan in the previous example retrieves rows from the objects of the database or accepts tuples from one or more row sources. The SQL engine executes each node of the tree, so each row source in this way:
- The black boxes steps indicate physically retrieve data from a database object (i.e.: tables, views, etc.). These steps are the access paths or techniques for retrieving data from the database.
- The clear boxes indicate operations on row sources (i.e.: joins, set operations, subquery, etc.).
How Database Engine Processes DML
For processing a SQL DML statement it is necessary to retrieve data from the database so it is necessary to give a tool to access the data efficiently. In this case, we are talking about data structures such as:
B-trees
In this way the SQL Engine can speed up the retrieving process:
Data Blocks
Data blocks are a way to improve performance managing the memory paging.
Most SQL DML statements have a query component. When a query is executed a cursor places the projection into a set of tuples called the result set.
How Row Sets Are Fetched
The tuples in the result set can be fetched a tuple at a time or in groups of tuples. In the fetch stage, the database engine selects rows and, if requested by the query, orders the tuples (result set). Each successive fetch moves on the next tuple of the result until the last tuple has been fetched.
Usually, the SQL Engine can't determine how many numbers of tuples will be retrieved by a query until the last tuple is fetched. SQL Engine retrieves the data (tuples), in response to fetch calls, so that the more tuples the SQL Engine reads, the more work it performs. For some SQL sentences or whole queries, the SQL engine returns the first tuple as quickly as possible, whereas for others it creates the entire result set before returning the first row.
Read Consistency
In general, a query retrieves data by using the Database Engine read consistency mechanism, which guarantees that all data blocks read by a query are consistent to a single point in time.
The undo data is used to perform the read consistency process and retrieve the past versions of data. For instance, imagine that a query have to read 100 data blocks in a full table scan. The SQL DML query processes the first 10 blocks while a SQL DML query in a different session changes block 75. When the first session reads block 75, it finds out for changes and uses undo data to retrieve the old if necessary (from undo data), so it creates a noncurrent version of block 75 in memory through takes the unmodified version of the data.
Data Changes
All the SQL DML statements that have to change data use read consistency to assure that the SQL engine retrieves only the data that matched the search exact criteria when the modification began.
Subsequently, these statements retrieve the database data blocks as they exist in their current state and then make the required modifications. Other actions that must be performed by the SQL engine in relation to the modification of the data such as generating redo and undo data.
Conclusion
Understanding how an SQL engine works is important because we can improve our software development process by writing better queries that leverage data structures, system architecture and algorithm operation. This will eventually lead to a more efficient, secure, maintainable and compatible code. Therefore, the in-depth study of the SQL language and its different implementations in SQL engines are extremely important.