Unlocking SQL Wizardry: Exploring the Google Merchandise Store Dataset on BigQuery
In the realm of e-commerce analytics, the Google Merchandise Store dataset on BigQuery stands as a goldmine of valuable information. This article explores how SQL, the language of databases, can be wielded to unravel meaningful insights from this vast dataset, empowering analysts and data enthusiasts alike.
Explore the Google Merchandise Store Dataset
Before diving into SQL queries, it's essential to familiarize ourselves with the Google Merchandise Store dataset. This dataset, residing on BigQuery, encompasses a rich tapestry of information, including user interactions, transactions, and more. Key tables such as ga_sessions and hits hold the backbone of our analytical exploration. For more information, you can have a look here
Accessing BigQuery
To embark on our SQL journey, the first step is establishing a connection to BigQuery and accessing the Google Merchandise Store dataset. I have already written an article that shows how to connect to BigQuery (here).
Let's test the connection !!!
-- Connect to BigQuery
SELECT
*
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
where visitID = 1494505508
LIMIT 10;
Unveiling Insights with SQL
Now, let's leverage the power of SQL to extract meaningful insights from the Google Merchandise Store dataset.
Count the number of total pageviews:
SELECT
COUNT(*) AS total_pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170630';
TABLE_SUFFIX BETWEEN '20160801' AND '20170630': Filters the data to include only tables with a suffix between the specified dates, limiting the dataset to a particular time range.
领英推荐
Find the most popular pages (by page views):
SELECT
hits.page.pagePath AS page_path,
COUNT(*) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
AND hits.type = 'PAGE'
GROUP BY
page_path
ORDER BY
pageviews DESC
LIMIT 10;
UNNEST(hits) AS hits: Unpacks the hits array, allowing us to access its elements in subsequent clauses.
AND hits.type = 'PAGE': Further filters the data to include only hits of type 'PAGE', indicating pageviews.
Determine the most popular traffic sources:
SELECT
trafficSource.source AS traffic_source,
COUNT(*) AS sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
GROUP BY
traffic_source
ORDER BY
sessions DESC
LIMIT 10;
Calculate the average session duration:
SELECT
AVG(totals.timeOnSite) AS avg_session_duration
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170630';
Overall, BigQuery is a powerful tool for working with large datasets like the Google Merchandise Store dataset. Whether you're an analyst looking to uncover insights or a developer building data-driven applications, BigQuery can help you make the most of your data. So why not give it a try and see what you can discover?
I also wrote 2 other articles (SR only in Vietnamese):
1) Mình tr? thành Data Analyst nh? th? nào? Link
2) Mình h?c gì ?? tr? thành Data Analyst? Link
Thanks for spending your time !!! Happy coding