Learn SQL the Effective Way with Data School New York
By way of introduction, I am Will, this year’s Tableau Iron Viz Champion and a recent addition to the core consultancy team at The Information Lab. I’m a firm believer that good data preparation is key to creating effective data visualisations (something I discussed in my recent Tableau Prep User Group talk - which you can find here if you wish to dig a little deeper). And so, when I was recently given the opportunity to visit the Data School New York to train a cohort in SQL I leapt at the chance to pass on my SQL knowledge!
Throughout my career, data manipulation with SQL is one of the strongest skills I have developed. Projects include working with the BBC Archives holding metadata, WorldRemit’s CRM database, and answering users' questions on ?Stack Overflow have all helped to further my understanding and strengthen my skills in this area.
Over the last 8 years every data analyst job I’ve had or hired for required SQL, and despite all the ETL tools available on the market, data analysts will still be asked for SQL proficiency. A sample of Glassdoor job ads for New York Data Analysts showed half wanted SQL skills, and, of those who asked for Tableau, 2 out 3 also asked for SQL.
For the New York Data Schoolers, I felt it was important that the team understood the core concepts, but were also able to delve straight into writing SQL queries and battling some of the real-world problems I had faced so that they were well-prepared for future client projects.
Getting started with SQL
When we learn SQL we all start with:
SELECT *
FROM orders;
This will return us all records from the orders table (* = all columns).
This works well in the classroom to display a few thousand rows, but in any organisation, this query could return millions of records:
Solution? Add a limit!
SELECT *
FROM orders
LIMIT 100;
Adding a limit here gives a good snapshot of the table data (just 100 rows), quickly and cost-effectively. While you are exploring the data with SQL be sure to add a limit.?
Exploring data with SQL
When checking for data quality issues, such as typos in product names, a query such as this could be useful:
SELECT DISTINCT product_name
FROM orders
ORDER BY product_name ASC
LIMIT 100;
We can’t always assume our data is stored in a consistent format, but we can check this using SELECT DISTINCT to return us the unique product names. using ORDER BY allows us to sort the data alphabetically (ASC = ascending) to help us spot any typos or products with different cases.
We can use filters such as the WHERE clause to help transform the data to answer the question:
SELECT *
FROM orders
WHERE city = ‘New York’
LIMIT 100;
Aggregating the data allows us to reveal insights
SELECT
product_name,
SUM(sales) as ‘sales’?
FROM orders
WHERE city = ‘New York’
GROUP BY product_name;
This gives us the total sales for each product we sold in New York (N.B. we drop LIMIT here).
Combining datasets with SQL
It’s rare in business that you’ll only work with one table, so after getting acquainted with querying the team dived into JOINs.
领英推荐
SELECT
O.product_name,
R.return_date,
COUNT(O.product_name) as ‘total_returned’
FROM orders as O
INNER JOIN returns as R on O.order_id=R.order_id
WHERE O.city = ‘New York’
GROUP BY
O.product_name,
R.return_date;
This returns a count of returned products by day in New York.
Crucially for the team, they needed to understand the operations SQL performs, which was best shown when I asked them to explain the different outputs of the following UNION queries:
SELECT order_id FROM orders
UNION??
SELECT order_id FROM returns
Vs.
SELECT order_id FROM orders
UNION ALL
SELECT order_id FROM returns
The key distinction here is this: UNION returned a DISTINCT list of order_ids from the orders & returns tables, whilst UNION ALL returned a long list of all order_ids from both tables containing duplicates.?
Putting it all into practice
With this knowledge acquired the New York Data School became SQL super sleuths and were tasked with solving a murder. The team were left to tackle The SQL Murder Mystery.?
As their coach, it was great to see how the team approached this task. The class were very collaborative, helping each other out where there were struggles but also independent enough to go it alone, with some exploring their own ideas and then feeding that knowledge back to the cohort.
Starting to code is not an easy task and I was impressed with how the New York Data Schoolers took to the challenge in a professional and collaborative way. The skills they displayed mean they have a great grounding in SQL for their future placements and consulting projects.?
My tips for getting started with SQL
Learning to code is challenging, SQL is a more friendly coding language than most but it still takes time to become a fluent SQL user. Here are a few tips from experience that helped me progress my skills with SQL
1. Know the jargon
Understanding what the different SQL operations are doing is invaluable when it comes to planning your data transformation, debugging (finding issues) in your queries, and effective Google searching. e.g. understanding the difference between UNION and UNION ALL is very important for building an efficient SQL script.
2. Stack Overflow
Stack Overflow is a coding community where users can post their coding issues and other users will help solve them. It is a great problem-solving resource to:
3. Practice regularly
Schedule time to practice SQL on a regular basis, I’ve recommended that our New York start tackling SQL problems on CodeWars. Even spending 10 minutes a day to try and solve a problem can help to embed the skills you have learned.
Special thanks to Ann Jackson , Toby Sharpe , Nathan Leather for letting me teach their cohort, and Karen D. Hamilton for the great photos.
Thanks for reading
Will
Divergent thinker and change agent at the intersection of Data Analytics and Social Justice.
2 年Will, DSNY1 is waiting to solve more mysteries with SQL.
Facilitating data-driven decisions with Tableau ??
2 年Good read! Thanks for sharing.
Head of IT at Concurrent Technologies Plc
2 年Great stuff Will ??
Managing Director @ The Information Lab US and Data School New York
2 年Invaluable time spent with the DSNY team, thanks Will!
Group Head of DE&I at Eurostar | Passionate about equity, justice and human rights
2 年Thanks for sharing your knowledge, Will. Some great tips here for getting started with SQL!