Unlocking SQL Wizardry: Exploring the Google Merchandise Store Dataset on BigQuery

Unlocking SQL Wizardry: Exploring the Google Merchandise Store Dataset on BigQuery

In the realm of e-commerce analytics, the Google Merchandise Store dataset on BigQuery stands as a goldmine of valuable information. This article explores how SQL, the language of databases, can be wielded to unravel meaningful insights from this vast dataset, empowering analysts and data enthusiasts alike.

Explore the Google Merchandise Store Dataset

Before diving into SQL queries, it's essential to familiarize ourselves with the Google Merchandise Store dataset. This dataset, residing on BigQuery, encompasses a rich tapestry of information, including user interactions, transactions, and more. Key tables such as ga_sessions and hits hold the backbone of our analytical exploration. For more information, you can have a look here

Accessing BigQuery

To embark on our SQL journey, the first step is establishing a connection to BigQuery and accessing the Google Merchandise Store dataset. I have already written an article that shows how to connect to BigQuery (here).

Let's test the connection !!!

-- Connect to BigQuery
SELECT
  *
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
where visitID = 1494505508
LIMIT 10;        

Unveiling Insights with SQL

Now, let's leverage the power of SQL to extract meaningful insights from the Google Merchandise Store dataset.

Count the number of total pageviews:

SELECT
  COUNT(*) AS total_pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630';        

TABLE_SUFFIX BETWEEN '20160801' AND '20170630': Filters the data to include only tables with a suffix between the specified dates, limiting the dataset to a particular time range.

Find the most popular pages (by page views):

SELECT
  hits.page.pagePath AS page_path,
  COUNT(*) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
  UNNEST(hits) AS hits
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
  AND hits.type = 'PAGE'
GROUP BY
  page_path
ORDER BY
  pageviews DESC
LIMIT 10;        

UNNEST(hits) AS hits: Unpacks the hits array, allowing us to access its elements in subsequent clauses.

AND hits.type = 'PAGE': Further filters the data to include only hits of type 'PAGE', indicating pageviews.

Determine the most popular traffic sources:

SELECT
  trafficSource.source AS traffic_source,
  COUNT(*) AS sessions
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
GROUP BY
  traffic_source
ORDER BY
  sessions DESC
LIMIT 10;        

Calculate the average session duration:

SELECT
  AVG(totals.timeOnSite) AS avg_session_duration
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630';        

Overall, BigQuery is a powerful tool for working with large datasets like the Google Merchandise Store dataset. Whether you're an analyst looking to uncover insights or a developer building data-driven applications, BigQuery can help you make the most of your data. So why not give it a try and see what you can discover?

I also wrote 2 other articles (SR only in Vietnamese):

1) Mình tr? thành Data Analyst nh? th? nào? Link

2) Mình h?c gì ?? tr? thành Data Analyst? Link

Thanks for spending your time !!! Happy coding


要查看或添加评论,请登录

Duy Bui的更多文章

社区洞察

其他会员也浏览了