Introductory SQL learning article for you being a data analyst - Part Two
Jonathan Osagie
Tech Entrepreneur | Strategic Leadership in AI, Data Analytics & Governance | Driving Business Transformation with Data & AI Solutions
SQL (Structured Query Language) has more than decades of history. SQL may not sound fancy as Python or R, two more modern popular programming languages, it has always been a part of the requirement for being a data analyst.
Please refer to Introductory SQL learning article for you being a data analyst - Part One
Aggregation — Group by
Another common application of SQL is to perform data analysis. And more often we perform data analysis after the grouping of data. For example, you may want to know the number of records for each school in the table, or you want to know the average age for each gender. At this moment, you will need to include group by in your SQL statement. For example, if you want to know the number of records for each school, the grouping level is school and thus you have to include group by school in the SQL statement. group by statement follows behind where statement. Moreover, you have to include school in the select clause in order to return the name of each group (in this case, the school).
The next step is to count the number of records for each group. For this part, we need to include an aggregate function count in the select clause. Following each aggregate function, there is a curly bracket including the column name for such aggregation. Since we want to count for school, we can include count(school)in the select statement. However, for count function, there is no need to specify which column to count because there is no difference to select which. In fact, you can even type count(1) to count the number.
select school, count(school) from dataset.student_mat group by school;
The result shows the number of records for each group in school.
Furthermore, we can expend the grouping level to more than one column.
select school, sex, count(school) from dataset.student_mat group by school, sex;
Below are aggregate functions that you can use in select statement: count, max, min, sum, avg.
select school, avg(age) from dataset.student_mat group by school;
READ: 25+ Quotes to Change How You Think About Problems
Alias — as
As you can see the result in the previous example, the column name for the average of age becomes avg(age). It is still self explainable. But of course, it is even better if we rename this column name. Then we can include as after avg(age) and give a new name to this column. We can also amend the name of other columns in the select statement.
select school as school_name, avg(age) as average_age from dataset.student_mat group by school;
I advise not to include space in a column name because it will only create trouble when you further select that column in your select statement.
Ranking — order by
order by statement helps you order your returned data in either ascending or descending order by one or more than one column. The default order is ascending so you don't need to specify. If you want to order in descending order, you have to include desc after the column name in order by statement.
select school as school_name, sex, count(school) as num_count from dataset.student_mat group by school, sex order by school, sex desc;
The order by statement is order by school, sex desc. As a result, the returned result first orders column school by ascending as there is no specification. Then sex is ordered descendingly. As a result, (GP,M) comes before (GP,F) while (GP,F) comes before (MS,M).
Till now, I have covered basic usage of SQL on querying and data analysis. For a complete beginner without prior experience in SQL, I believe this article can help you learn basic SQL application quickly without any difficulty. Please let me know if this article helps you or if you want further coverage on other SQL usage like join, having or window function. See you next time.
READ: Data Analysis in Research
________________________
Jonathan Osagie is a Data & Business Intelligence Analyst interested in understanding stories that data tells, actions it inspires through analysis & analytics, and the success it brings to businesses by aiding strategic decisions.
Follow me on Twitter - @iamosagie_ng