SQL Queries Quickstart: Part I
Welcome to my SQL Queries Quickstart! Designed for complete newbies or developers with some SQL knowledge, this article is your go-to for mastering the essentials of query building.
From Relational Databases to more complex concepts like JOINs and Performance Tuning, this guide will take you on the basics for you to be prolific in the art of writing queries.
Disclaimer: This is a Quickstart guide and by no means a comprehensive course. It includes the concepts that, based on my experience, I believe are essential to know before embarking on your query builder journey. After all, this is just an introductory guide, we can't cover everything! Nevertheless, whenever necessary, I'll be including useful links to resources on each specific topic.
Index
Initially I thought of creating an all-in-one guide, but the Article was just too long. So I divided the guide in four parts.
Since this is the first part, you can find the rest in the replies of this article.
Now, let's get started!
Our playground
Throughout this article, we'll be illustrating SQL concepts and techniques using a sample application scenario. Our chosen example revolves around a user tracking system for educational courses. Let's call it Educa.
Our code style will include:
Always remember that any code style and naming decision does not really matter. But it is important for any project to remain consistent with the rules you choose, to avoid confusion.
Along the way I will be adding some DB Fiddle links so you can review and play with the queries I'm showing.
Now, a -very- little bit of history.
About databases, SQL and the Relational Model
Most applications will need to store certain information and persist it over time. For example in Educa we want to store student information, track what courses they are on, store their evaluation results, and so on.
The first databases were implemented in files, for example using Comma Separated Values (CSVs). But as applications got more complex, databases evolved to become an application on its own.
The Relational Model was born on 1969 (what a year!) when Edgar F. Codd presented it to the world. It's purpose is to provide a declarative method for specifying data and queries on it. A year later SQL (Structured Query Language) was born at IBM, based on Codd's paper, as a way to standardize access to these type of databases.
We won't cover the theory of the Relational Model in this article, but we will explain some of it's concepts. If you are interested, here are some links:
Tables
Relational databases store information in tables. Tables are a collection of rows (sometimes called records), each usually identified by a primary identifier or key.
Each row is composed by columns, which store the row properties.
Educa will need a list of all the courses available for the students. Let's check the courses table definition and run a basic query to get them all:
At the top we can see a DDL (Data Definition Language), a CREATE TABLE statement. These statement types are used to create and modify the structure of the database (not the data). We won't focus on DDLs in this article. But it is important to know some constraints and indexes columns may have, as they directly affect our queries. We'll review them in the next sections.
Now turn your focus to the second statement: we are executing our first query! For now, just look at the resulting table. You can see the rows (identified by an id) and the columns that represent the data stored for each one.
Each column has a datatype, and this depends on the database engine you are using. Our examples will use a MySQL-compatible database: that's why we have VARCHARs for strings or TINYINTs for booleans.
???? Play in Fiddle
Tip: we mentioned DDLs, the language to modify our database structures. Another important part of SQL are the DMLs (Data Modification Language): these statements are used to modify the data inside the tables. The most common ones are INSERT, UPDATE, DELETE and -less used- MERGE. They are outside the scope of this article, since we are focusing on SELECT queries only. I may write on them at some other point tough.
Now back to the definition.
Primary Keys
A Primary Key (aka PK) is a column that uniquely identifies a row inside the table. In our example, it is the id column .
Most PKs are usually defined as an integer field. In MySQL they are auto generated. Other DB engines require you to create a sequence and a trigger first (this is out of the scope in this article though, but you can check this Oracle-based example).
All our PKs will be auto generated. By default they are incremented by 1 each time a new row is inserted to the table.
Tip: If you are still tempted on using using user data as you PKs or maybe UUIDs, here's a good article on why you probably shouldn't.
Unique Keys
We also have a Unique Key (aka UK). Any UK could also work as a primary key, but as mentioned before, it's generally not recommended to use application data as primary ids.
For example, in the users table, we don't want two rows (two students) to have the same user name.
Tip: In most database engines, defining a PK or UK will create an implicit index. This is because the database needs the to be fast when checking the column values, for example during an insert operation.
NULLABLE option
The NOT NULL constraint is a very handy one that will require all rows to have a value in the column.
For example, in our courses table, we want all courses to have a name.
Having a required value also helps the database optimizer when finding the execution plan for our queries. We will see this in the Performance part, for now just try to avoid nullable columns whenever you can.
Indexes
Indexes are structures (usually a B-Tree) that order rows based on one or more column values. This order allows for faster retrieval of rows when the query is filtering or ordering by those values.
领英推荐
In our users table (apart from the PK and UK implicit indexes) we have two more indexes that will be used by our application when filtering rows:
Tip: the fullName index is a composite index: it indexes by two or more columns. More on this on the Performance part.
Indexes and performance is a very long topic, and will be cover a bit more in part IV, but if you want to learn more, here are some useful references:
Our first Query
Ok, we have learn what a table is, and went through their most important properties. Now let's put this to the test by creating our first query.
Educa will have students. Let's review the users table and create a query to list them. Our first objective is bringing the full names for all non-disabled users, order by their full name. Here's our code:
???? Play in Fiddle
A SELECT query consists in at least 1 column to retrieve, and a table to read from. So both SELECT and FROM are required.
Then, there are other parts to filter (WHERE and HAVING), sort (ORDER BY), group (GROUP BY) and limit the rows.
Tip: the limit clause is not consistent with all database vendor. This is probably due to pagination being such a hard problem to get it right, performance wise. We will see recommendations in the Performance part. Here's a good article with the different options, we will be using MySQL's LIMIT clause when needed.
Let's dissect this query.
Functions
To get a student full name, we are using the CONCAT function. This function joins the last and first name, separating them with a comma.
Functions tend to differ depending on the database engine you use. The good news is that all engines have more or less the same functionality, so it's not hard to jump from one to another.
For example, here you can find information for Mysql functions. Manuals tend to categorize functions depending on the type of column they are working with: number, string, dates, etc.
Tip: if you intend to allow your code to work with multiple databases engines, then functions, along with the LIMIT clause as explained before, are probably the most important pieces to abstract out.
Filters
To filter rows from the result set, we use the WHERE clause. In the example we are requiring all rows to have the enabled column with a true value.
Filters are defined as a set of boolean conditions, so you can add as many as you want. You can use AND and OR keywords and make them as complex as you want. Use parenthesis to group conditions if your logic gets too complicated.
In this snippet we included some known builtin users, even if they are not allowed to login. Go to the Fiddle and test it out yourself!
Tip: Notice that MySQL doesn't have native boolean support, so we use an integer instead.
Apart from the usual operators like =, !=, >, <, >=, <=, SQL has some others for specific datatypes.
For example, to filter a text column, you can use a the LIKE operator along with special characters % (zero or more characters) and _ (one character). For example, to view all users whose last name starts with a C, you'll do this:
Tip: Remember to check out the table and/or column collation and character settings for character-based columns. For example, if you intent to store user-generated texts, I recommend a case insensitive Unicode character set like utf8mb4 in MySQL.
NULL comparison will be discussed bellow.
IN/NOT IN allows to filter a list of values. Together with EXISTS, they allow to use subqueries for filtering. We'll see them in Part III.
Sorting
When you need to retrieve more than one row, you will usually want to sort them by some criteria. We use the ORDER BY clause for this purpose.
In our example, we are sorting by lastName. But since we may have a lot of repeated last names, we also added a secondary sorting criteria using the firstName column.
Tip: always add the primary key at the end of your sorting, when your sorting criteria is not unique. This will help you disambiguate the order of the rows, and will avoid sporadic test failures. Why? A database engine does not guarantee any order without a unique criteria, so rows could move between different query execution.
About NULL
NULL represents a missing or unknown value in a column. It's not the same as zero or an empty string; rather, it indicates the absence of any data in a particular field.
If you want to match NULL values, you can use the special IS [NOT] NULL operator.
Tip: most functions operating over a NULL value, will always return NULL. Exceptions are functions that expect NULLs, like IFNULL(). Errors such as division by zero might return NULL on certain database engines, while others might generate an error. To understand the behavior in these edge cases, check your database settings and properly document what your app needs.
We will see examples of NULL filters in the LEFT OUTER JOIN section in part II.
Before continuing
In Part II, we'll delve into multi-table queries, and they will become a bit more intricate. Therefore, I highly recommend experimenting in the Fiddle playgrounds with what we've learned thus far beforehand.
Get accustomed to querying the tables, adding orders, and using functions to transform the results. Here are some ideas to get started:
Once you feel comfortable writing queries, I'll see you in Part II. Keep an eye out for it in the replies, or stay tuned to my Articles section on ??.
Principal Engineer at Avature | @josecanciani at ??
6 个月Part III is online!! Check my recent post to suggest topics on the last part (Performance!). In the mean time, here's the last one. https://www.dhirubhai.net/pulse/sql-queries-quickstart-part-iii-jose-canciani-bmhhe/
Principal Engineer at Avature | @josecanciani at ??
7 个月Part II is live! Check it out here at LinkedIn https://www.dhirubhai.net/pulse/sql-queries-quickstart-part-ii-jose-canciani-lonif/ or on ?? where I'm posting first and more often: https://x.com/josecanciani/status/1785332082622734619
Principal Software Engineer (Test Architect)
7 个月Betina Shim
CEO en Blimop | Executive MBA en IAE Business School | Mentor Minka Instituto Inclusivo de Negocios
7 个月congratulation José !! nice post..