The Ultimate Guide to SQL for Product Manager
Product management is about taking your product in the right direction with the fastest speed possible. Data is always going to be an integral part of your decision-making, and having the ability to do the primary analysis yourself can definitely boost your speed as a product manager.
Depending on the type and scale of the organization that you are working for, as a product manager you’ll have access to a lot of free and paid tools available for data analysis, ranging from Google Analytics, Mixpanel to Tableau, or PowerBI.?
While these tools will add up to your productivity and analysis skills, it’s better as a product manager to be as close to your data as possible. In most of the applications that you’re going to work with, the data would be stored in something known as relational databases.
Once you develop a decent understanding of how these work and how you can fetch the data and insights from these databases you’ll get a better command of your product’s usage data.?
This understanding will help you to make decisions faster as a product manager because now you do not have to depend on your data team to carry out basic analysis for you and also you can guide your development team to record any additional data that would help you in improving the product.?
Do product managers need to know SQL?
In summary, here are 3 reasons why product managers should learn SQL:
What is SQL?
SQL which stands for ‘Structured Query Language’, is a computer language that is designed to retrieve and manage data in a relational database.?
Thus to understand SQL you need to learn three things:
A relational database is a collection of tables where data is stored in rows. Think of it like a google spreadsheet, the whole spreadsheet will be called as a relational database and the individual sheets within the spreadsheet will be called as tables. To enter any new data we insert a row into any of the sheets and the information about that data is revealed by column names.?
These tables could be related to each other through some common columns (also known as foreign keys). Lets consider a very basic example from instagram to understand this, a simplified database structure (AKA data-schema) for instagram could look like this:
As per this schema, whenever someone registers on Instagram, a new row is added, the person is assigned an automatically generated unique id (id) and other information (first_name, last_name, handle, registered_at, email_id) about them is recorded in the users table.
Whenever someone uploads a photo on instagram a row in photos table is added, which will contain the photo url, time of posting and caption along with the user_id to keep a track of which user has uploaded this photo,
When someone likes this photo, one new row is entered in the likes table with the user_id of the person who has liked the photo, the id of the photo which is liked and the time when the photo was liked.
Now if you have been following closely till now, you would have identified that the photos table is connected to the users table through user_id, the likes table is connected to the users and photos table through user_id and photo_id respectively.?
The user_id in the photos table, the photo_id and user_id in the likes table are examples of foreign keys.?
The ability to create these kinds of relationships between data tables is the real superpower of relational databases, and you can model the data for almost any product into a relational database using multiple tables with such relations between them.
The example we have taken here is extremely simplistic and in real-life instagram might be using hundreds if not thousands of tables to model and store their data.
While this may not be under your responsibility as a product manager to decide or create the data-schema, it is an added advantage to have a clarity of what all data do you store, so that you are aware of what all insights you can drive from your data.
Now that we know how data is stored in these tables, lets understand how we can retrieve data from these tables to drive insights.
This is where SQL comes into play. Simplistically speaking, SQL is the high-level language to give commands or ask questions to your database.?
Retrieving data with SQL
While you can use SQL to read, insert, update or delete data into relational databases, as a product manager your scope of work will always be limited to reading data.
Now let us understand how we can make some basic read queries to our database through SQL.
We will take examples from our instagram schema only.
If you want to see some data from any tables you will use the popular ‘SELECT’ query.
The syntax of query is as follows:
If you want to retrieve all the columns you can replace {column names} with “*”.
Using * could be particularly useful for product managers when they want to see the list of information that is stored in an existing table.
Thus, to retrieve data from the users table in our instagram database, we will use:
Filtering data
Now if you want to filter your data by some criteria, you can use the ‘WHERE’ clause, for example you want to name and date of registration of the user whose email address is ‘[email protected]’, for such as case you will use:
Apart from = condition, you can also use conditional statements such as greater than (>), greater than equal to (>=), less than (<), less than equal to (<=) or like to further define your filters.
You can also combine multiple WHERE conditions to further define your data requirements. The combination can be done by using ‘AND’ when both conditions are essential or by using ‘OR’ when only one of the given conditions are required to be correct.
Here are some example queries to help you understand this better:
Get List of people who have registered in march 2021
Get list of people whose first names are either Ankit or Ankur
Get list of people who use gmail.com as their email provider
If you have a range of items to check from, you can use ‘WHERE IN’ clause, for example: to get list of people whose email addresses are ‘[email protected]’, ‘[email protected]’, and ‘[email protected]’ you can use following query:
领英推荐
That was about 80% of the knowledge you need to have on filtering data, if you are interested in learning more please go to tutorialpoint to expand your SQL knowledge.
Apart from filters there are two more important things which you as a product manager should know: How to sort the data and how to limit the amount of data you want to retrieve.
LIMITS & Sorting
Here are some examples:
This query will give you list of users, with recently registered users on the top (descending order of date of registration
This query will give you 10 recently registered users:
As a product manager you should almost always use LIMIT in your queries, because omitting LIMIT will cause SQL to select all the available rows, which can affect the performance of your database or can even bring your database down if there are large numbers of rows in the table.
AGGREGATE & GROUP BY functions in SQL
SQL also offers something known as AGGREGATE functions, which can help you perform some calculations on data that you retrieve. The common AGGREGATE functions are: AVG, SUM, MIN, MAX, COUNT, SUM etc. Their functions are self-explanatory by their names, eg: COUNT will give you the count of all the rows which satisfy a particular condition and SUM will give you the SUM of all values of a particular column in a result-set.
Here is an example to illustrate the usage of the aggregation function COUNT:
To get all the likes of a particular photo on instagram, you will use this query :
To get the number of photos liked by any particular user (lets say user_id = 20) you will use a query like this:
You can read more about different types of aggregate functions here?
Another important command to learn in SQL is GROUP BY clause. With group by clause you can arrange identical data into groups. When used together with AGGREGATE functions, GROUP BY can uncover some really helpful insights for you.
Lets understand with few examples:
To know the number of photos liked by each user on instagram we will use:
To know the number of photos posted by each user on Instagram we'll use:
JOINs in SQL – Using multiple tables together
Till now we have understood how you can filter, sort, limit, aggregate and group data. Now the next and the most interesting part of SQL: JOINS. Joins will help you connect two or more tables and unlock insights which are spanned across multiple tables.
To join two or more tables, you’ll need some common values (columns) between them, for example to know names of all the people who have liked photos we will have to join two tables Users & Likes, the common columns will be id column in users table and user_id column in Likes tables.
Here is what the query is going to look like
While this definition is enough for some of the common use cases, you should definitely spend some time learning about different types of joins here.
You now are aware of the most common SQL queries and terms that a product manager should know about SQL.?
You now have practical knowledge of:
Where do I write all of these queries?
One pressing question that will come to the minds of some readers is ‘where do i write all of these queries to get the data I want?’. For this you need to talk to your developers, if you are working for a decent tech company then you might already have a read-only database exposed for analytical usage, and that tool will most probably give you a window to write custom SQL queries like the ones we have studied above.
Some common free tools you can use to expose read-only databases are:
Your developer should help you in setting up an environment or you can read documentation from these tools and work with developers to get it done for you.
Here is how metabase look like:
Some analytics tools such as Google’s firebase (through bigquery) and Mode analytics give you the ability to run native SQL queries on their data-sets.
Best practices for product managers to work with SQL and native data
Here are some questions which you can practice answering through SQL for your product:
Join upcoming Product Management Program?