How SQL Database Engines Work
In this article, we are going to take a deep dive into the internals of the SQL Database Engine and see what is going on there. It's obvious no one cares about the internal working of SQL engine or something because without knowing How SQL works we can still create and access databases using any SQL scripts. But let us go beyond that and take a look at the internals, shall we?
The primary job of any database system or engine is reliably storing data and making the data available for use when needed. We use databases as a primary source of data, helping us to store or hold data until we need it to share between different parts of our applications.
What is SQL?
Let us start with SQL. SQL stands for Structured Query Language which could be referred to as either a programming or a Query language and the main purpose of SQL is to interact or speak with the relational database in which data is stored in tabular form. As we can see, SQL is not the database itself, but the language the database speaks so to interact with the database, we speak its language, SQL.
Now, a while ago, we mentioned relational database. Talking of relational databases that speak SQL, the first types that come to mind are Oracle Database and MySQL database but there are many more and are all examples of relational databases systems or engines. When we say database systems, engine or database management systems or database servers, do not get confused. There are some terminologies in databases such as the ones listed above and we can interchangeably use these terminologies so they all are the same.
Database Management Systems
SQL can be used to speak to relational databases or if you like, DBMS in order to manage and store data; whether small data or a large amount of data, especially if data is written simultaneously and we have many transitions over that data. When we use SQL for data management, we get the ability to perform CRUD: Create, Retrieve, Update, and Delete data between databases. To do these CRUD, we can use simple textual, non-GUI interfaces like the DOS on Windows or Terminal on UNIX-like operating systems e.g. Linux.
We can also do that through GUI database management systems administration tools like No alt text provided for this image phpMyAdmin or Adminer. There are various Relational Database Management Systems(DBMS) such as Oracle Database and MySQL database and a host of others e.g. PostgreSQL, MariaDB SQLite, Microsoft SQL and even Microsoft Access, etc. and they all provide the same kind of features. The main purpose of these administration tools is to abstract the tedious task of manipulating data using textual command interaction with the underlying DBMS, instead these admin tools present us with rich features to manipulate our data visually.
Database management systems use a client/server model, where database system instances (nodes) take the role of servers, and application instances take the role of clients.
Client requests arrive through the transport subsystem. Requests come in the form of queries, most often expressed in some query language. The transport subsystem is also responsible for communication with other nodes in the database cluster.
Upon receipt, the transport subsystem hands the query over to a query processor, which parses, interprets, and validates it. Later, access control checks are performed, as they can be done fully only after the query is interpreted.
The parsed query is passed to the query optimizer, the optimizer first eliminates impossible and redundant parts of the query, and then attempts to find the most efficient way to execute it based on internal statistics (index cardinality, approximate intersection size, etc.) and data placement (which nodes in the cluster hold the data and the costs associated with its transfer). The optimizer handles both relational operations required for query resolution, usually presented as a dependency tree, and optimizations, such as index ordering, cardinality estimation, and choosing access methods.
The query is usually presented in the form of an execution plan (or query plan): a sequence of operations carried out for its results to be considered complete. Since the same query can be satisfied using different execution plans that can vary in efficiency, the optimizer picks the best available plan.
The execution plan is handled by the execution engine, which collects the results of the execution of local and remote operations. Remote execution can involve writing and reading data to and from other nodes in the cluster, and replication.
What is the SQL Engine?
The SQL engine is the actual underlying program or if you like, software that collects and interprets the SQL commands so the appropriate operations can be performed on the relational database. The objective of the SQL engine is to create, read, update and/or delete (CRUD) data from a database.
SQL Engine or SQL server database engine includes two main components; a storage engine and a query processor, these days some modern SQL DBMS contains more than one Storage engines. We have many types of SQL engines and they all have different architecture, but used to perform the same objective which includes CRUD operations on the database and many other features.
Databases are modular systems and consist of multiple parts:
The transport layer accepting requests, query processor determining the most efficient way to run queries, execution engine carry out the operations, and a storage engine stores the data for later retrieval. The storage engines look at the data more granularly and offer a simple data manipulation API, allowing users to create, update, delete, and retrieve records.
There's no one blueprint for database system design. Every database is built slightly differently, and component boundaries are somewhat hard to see and define. Even if these boundaries exist on paper (e.g., in project documentation), in code seemingly independent components may be coupled because of performance optimizations, handling edge cases, or architectural decisions.
How the SQL Database Engines Work?
Well, on the surface and technically speaking, the compiler of SQL compiles the query and the virtual machine executes the compiled query. Yes, but more happens under the hood. So let us open up the hood.
SQL has many stages on which the process of query compilation and execution takes place. Every SQL database engine contains two main components Compiler and Virtual machine to execute the queries. The compiler read the query and convert that query to appropriate byte code, then that byte code evaluated by the virtual machine and a proper response given back to the client. The Complete Execution of a query is Categories into many stages:
- 1. Compiling (Parsing, Checks, and Semantics)
- 2. Binding
- 3. Optimizing
- 4. Executing
This is a part of the compiling process, and during compiling-parsing, the query statement is tokenized into individual words with appropriate verbiage and clauses.
The Compiling Semitics checks the validation of the statement and matches it with the system's catalog. This Compiling stage validates whether the query is valid or not, it also validates the authority of the user to execute the statement.
It creates the corresponding binary representation for the entered query statement. All the SQL server engines have this compiling state where the byte code gets generated. By this stage of compiling the statement has been compiled and now it will be sent to the database server for the optimization and execution.
It Optimizes the best algorithm for the byte code. This feature is also known as Query Optimizer or Relational Engine.
The Virtual machine gets the Optimised byte code and execute it.
SQL STATEMENT --> Parsing -->Binding --> Query Optimization --> Query Execution --> ResultNote: The Parsing Compiling process does not require any permission from the database which makes it the fastest processing stage of Compiling.
SQL conversion of Data to Table
SQL was written in C and it uses a principle of Binary-Tree which makes the incoming data store in Rows and Columns. In a binary tree structure, we have several branches that keep pointing to the new data element same structure is followed by the SQL database in which data get turned into tables where each column and rows data point to each other.
Summary in a Nutshell
The SQL engine Processes the query into multistage. The processing of queries can vary from one relational DBMS to another. In the very first stage, the query gets parsed and converted into a compatible format such as the JSON file, then another compiling process takes place which checks the sematic of the parsed file, and in the last stage of compilation, the parsed file gets converted into corresponding byte code. The Second step is optimization in which appropriate algorithms such as sorting, searching, etc. applied to the byte code. At last, the virtual machine executes the code and provide the client with the proper result.
To create a database environment, we require SQL database engine, and to build the database engine we often use the low-level programming languages such as C++ or C because they give the user control over the memory management, which is missing in high-level programming languages, in high-level programming languages memory-management is done automatically by the operating system. SQL engine is cross-platform developers can use different platforms to build some program but all the platforms can link with SQL engine for the database facilities
References and resources used: