How to Analyze Web and App Traffic Data with Google Analytics 4, BigQuery and Looker Studio
READ THIS GUIDE ON MEDIUM
If you have a website or an app, you probably want to know how your users are interacting with it. You can also set Key Performance Indicator (KPIs) for prioritizing data metrics. What pages are they visiting, what actions are they taking, what devices are they using, and how are they converting into customers? These are some of the questions that web and app analytics can help you answer. This is where Google Analytics 4 comes in.
Did you know that nearly half of users won’t wait longer than two seconds for a website to load? Or the fact that the average mobile app session duration was 6.2 minutes?
You might wonder why this stuff matters and what do these even mean? How do you even measure them? Well, I am here to answer these questions today! ??
To get started let’s get introduced with Google Analytics.
Visit: www.uttamdeb.com for more similar guides!
What is Google Analytics 4 and why use it?
GA4 is the latest version of Google Analytics, which was launched in October 2020. It is a major overhaul of the previous version (Google Analytics 3, or Universal Analytics as it was known), which was based on sessions and pageviews. GA4 is based on events and parameters, which means that it can capture more granular and flexible data about user behavior across various platforms and devices.
GA4 also offers some new features, such as:
READ THIS NEXT: Need for Speed: Why Your Website’s Speed Matters More Than Ever Before and How to Measure it
If your website is integrated with Google Tag Manager and Google Analytics 4, you will be able to track and study traffic data from the Google Analytics interface. Google has extensive documentation on how you can setup these tools. But what makes this tool super-powerful is it's integration with Google BigQuery. The BigQuery integration is especially useful for data analysts who want to access the raw event-level data and perform complex queries that are not possible in the GA4 interface. For example, you can join your GA4 data with other data sources, such as CRM or inventory data, to get a more complete picture of your users and business. You can also create custom metrics and dimensions that are not available in GA4, such as user lifetime value, retention rate, etc. You can also create altered versions for existing metrics, for example you want to count power users; users who have interacted consecutively with the website for more than a minute. Such data points can be created with this dataset.
In this article, I will show you how to import the raw event-level data from Google Analytics 4 (GA4) to Google BigQuery (BQ), and then write a series of SQL queries to perform exploratory data analysis (EDA) and create a dashboard with near-real time data. This dashboard will contain critical web and app traffic data that will easily visualize different performance-related metrics and help identify if there is any issue with a particular payment step or other similar steps. This will also make it easy to identify what product features are working and what can be improved simply by looking at the numbers and graphs.
How to import GA4 data to BQ?
To import your GA4 data to GBQ, you need to have the following:
The steps to import GA4 data to GBQ are:
Once you link your GA4 property to your BQ project, you will start seeing tables in your GBQ dataset with the prefix events_. Each table corresponds to one day of data, and contains the following columns:
For more details on the schema of the GA4 tables in BQ, you can checkout the official documentation from Google.
How to perform Exploratory Data Analysis (EDA) on GA4 data in BQ?
Now that you have your GA4 data in GBQ, you can start writing SQL queries to explore and analyze it. Here are some examples of common EDA questions that you can answer with SQL:
SELECT COUNT(DISTINCT user_pseudo_id) AS users
FROM `sample_ga4.analytics_252441088.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231031'
SELECT
COUNT(DISTINCT CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING))) AS sessions,
AVG(engaged_session_event) AS avg_session_duration
FROM `sample_ga4.analytics_252441088.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231031'
AND event_name = 'session_start'
SELECT
event_name,
COUNT(*) AS event_count
FROM `sample_ga4.analytics_252441088.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231031'
GROUP BY event_name
ORDER BY event_count DESC
LIMIT 10
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
COUNT(*) AS page_view_count
FROM `sample_ga4.analytics_252441088.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231031'
AND event_name = 'page_view'
GROUP BY page_location
ORDER BY page_view_count DESC
LIMIT 10
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'item_name') AS item_name,
COUNT(*) AS item_view_count
FROM `sample_ga4.analytics_252441088.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231031'
AND event_name = 'view_item'
GROUP BY item_name
ORDER BY item_view_count DESC
LIMIT 10
SELECT
traffic_source.name AS source_name,
traffic_source.medium AS source_medium,
COUNT(DISTINCT user_pseudo_id) AS users
FROM sample_ga4.analytics_252441088.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231031'
GROUP BY source_name, source_medium
ORDER BY users DESC
LIMIT 10
These are just some of the basic queries that you can use to get a sense of your GA4 data in BQ. You can also write more complex queries to answer more specific questions, such as:
Lets build a comprehensive SQL query to track segmented user behavior in BQ
Now that we performed some basic EDA queries to better understand the dataset, we can now proceed to build series of logics that can accurately track users by different segments, such as device category (was it a mobile device or a desktop-based device?), platform (did the user visit the web version or from a Android/iOS app?), page type (It could be a page describing the product or perhaps a payment page) or events.
An event is a data point that allows you to track specific interactions or occurrences on your website or app. This could include actions such as a page load, a link click, or a purchase completion. It can also be used to monitor system behavior, like an app crash or an ad impression. There are different types of events that you can learn more about from Google's documentations.
Daily Active Users (DAU)
First, lets write a few SQL statements that aggregates users by a few key segments to find out Daily Active Users (DAU). In this example, we will calculate all of the data points by each day.
SELECT
event_date as day,
-- DAU and variations
COUNT(DISTINCT user_pseudo_id) AS dau,
-- "user_pseudo_id" is a unique GA4-generated field assigned to each user visiting the platform.
COUNT(DISTINCT auth_user_id) AS logged_in_dau,
-- "auth_user_id" is a custom field assigned to a user if they log in to our platform. This field will defer according to your platform and industry.
COUNT(DISTINCT CASE WHEN event_name IN ('first_open','first_visit') THEN user_pseudo_id END) AS new_dau,
COUNT(DISTINCT CASE WHEN device.category IN ('desktop') AND platform IN ('WEB') THEN user_pseudo_id END) AS web_desktop_dau,
COUNT(DISTINCT CASE WHEN device.category IN ('mobile') AND platform IN ('WEB') THEN user_pseudo_id END) AS web_mobile_dau,
COUNT(DISTINCT CASE WHEN device.category IN ('desktop') THEN user_pseudo_id END) AS desktop_dau,
COUNT(DISTINCT CASE WHEN device.category IN ('mobile') THEN user_pseudo_id END) AS mobile_dau,
COUNT(DISTINCT CASE WHEN platform IN ('ANDROID') AND device.category IN ('mobile') THEN user_pseudo_id END) android_app_dau,
COUNT(DISTINCT CASE WHEN platform IN ('IOS') AND device.category IN ('mobile') THEN user_pseudo_id END) ios_app_dau,
Product Display Page (PDP) visitors
Woah! Those logics were useful! Now lets try to find out how users are visiting our product pages and just like before, lets segment it by device types.
-- Product Display Page (PDP) visitors and variations
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') AND device.category IN ('desktop') AND platform IN ('WEB') THEN user_pseudo_id END) AS web_desktop_visitors,
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') AND device.category IN ('mobile') AND platform IN ('WEB') THEN user_pseudo_id END) AS web_mobile_visitors,
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') AND device.category IN ('desktop') THEN user_pseudo_id END) AS desktop_visitors,
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') AND device.category IN ('mobile') THEN user_pseudo_id END) AS mobile_visitors,
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') AND platform IN ('WEB') THEN user_pseudo_id END) AS web_visitors,
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') THEN user_pseudo_id END) AS overall_visitors,
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') AND platform IN ('ANDROID') AND device.category IN ('mobile') THEN user_pseudo_id END) android_app_pdp_visitors,
COUNT(DISTINCT CASE WHEN event_name IN ('view_item','landing_page_visited') AND platform IN ('IOS') AND device.category IN ('mobile') THEN user_pseudo_id END) ios_app_pdp_uv,
Add to Cart and Checkout
Alright then, now lets find out user interactions with the Add to Cart and Checkout pages.
-- Add to Cart & Checkout events
COUNT(DISTINCT CASE WHEN event_name IN ('add_to_cart','add_to_cart_attempted') THEN user_pseudo_id END) AS add_to_cart,
COUNT(DISTINCT CASE WHEN event_name IN ('begin_checkout','checkout_attempted','Adm_checkout','purchase_attempted') THEN user_pseudo_id END) AS checkout_step1,
COUNT(DISTINCT CASE WHEN event_name IN ('add_shipping_info') THEN user_pseudo_id END) AS checkout_step2,
COUNT(DISTINCT CASE WHEN event_name IN ('add_payment_info') THEN user_pseudo_id END) AS checkout_step3,
Sessions
Since we are at it, lets also keep track of user sessions
COUNT(DISTINCT CASE WHEN event_name = 'session_start' THEN concat(user_pseudo_id, cast(event_timestamp AS string)) END) AS sessions,
Conversions
Okay, so the all-important step now. Lets check out user conversions.
领英推荐
COUNT(DISTINCT CASE WHEN event_name IN ('purchase','in_app_purchase','purchase_success') THEN user_pseudo_id END) total_buyer,
SUM(CASE WHEN event_name IN ('purchase','in_app_purchase','purchase_success') AND platform IN ('WEB') THEN event_value_in_usd END) web_rev_usd
We will finally conclude the query using the FROM and WHERE statement. In the WHERE statement, we will take a smaller date range as we will be testing the query extensively before proceeding with any kind of production-level data. This will make our queries more efficient and reduce costs.
FROM
`sample_ga4.analytics_252441088.events_*`
-- Short date range will ensure queries will run faster bearing minimal cost.
WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231010'
Storing data in a warehouse
After asking the above questions, we have a clear picture of what metrics we need to measure our traffic performance. We also have some insights into how users interact with our platform and how we can analyze the data further. But before we do that, we need to understand why we used a wildcard filter called “_TABLE_SUFFIX” and a date range of ten days in our SQL statements.
The reason is twofold. First, Google Analytics 4 collects event-level data, which is very detailed and granular. This has its advantages, as it allows us to capture every user action and behavior on our platform. But it also has its drawbacks. The main one is the amount of data that we have to deal with. How big is your platform in terms of users? What are your DAU and MAU? If they are high, say, more than 100,000, you will have a lot of data to process. We are talking about millions of events in a short period of time. This is very much expected, as a user may hit around 35-40 events (depending on how many events you have set-up in your website and app) on average on the one or two minute they spend on your platform on average. This can affect the query execution time and the billing costs in BQ. The good news is that BQ is very fast and efficient at sorting and filtering large amounts of data. So, if this is a non-issue then what's the catch?
The second reason is the data quality. With any technology, there can be some errors or inconsistencies in the data collection. Not all properties or attributes may be 100% accurate or reliable. And with so much data, it can be hard to spot and fix these issues. That’s why I have limited the data to only ten days, which makes it easier to check and validate. Of course, you should also test your website properties thoroughly using DevTools, but that’s another topic for another day.
Another thing that you should know is that the GA4 dataset in BQ stores data clustered by date. What does this mean? It means that when you create a table in BQ, you can specify which columns to sort it by. These are called clustered columns. Clustered tables make your queries faster and cheaper by only scanning the data blocks that match your query conditions. Clustered tables are useful when you frequently filter on specific columns or when you filter on columns that have many different values. So when I use a date range of ten days, SQL’s order of statement execution comes into play and BQ ignores the dates that are not in the range. This improves the performance and efficiency of the queries.
Now that we have explained these aspects, let’s see how we can transform the incoming data to extract the most value from it without compromising on any key metrics or data quality.
We will use something called the Scheduled Query in BQ, which will help us achieve this.
1. Once you are done with EDA and testing, combine the above query snippets into one, unified query.
2. Click on 'Schedule' from the SQL workspace in BQ.
3. Select a name, set an update frequency and destination table
4. Select the 'Append to table' for write preferences.
5. Select encryption and notification settings and click on 'SAVE'
This will initiate a data transfer into your destination table that you set previously. The append parameter will ensure that previous data isn't flushed and that all data is stored and new data simply gets added in the destination table, sort of what factories do in a production plant while stickering a box or packet, they simply keep jamming one box behind another.
Phew! Our data is now ready for further analysis and for a nifty dashboard!
Creating a comprehensive dashboard with GA4 data
Once you have imported your GA4 data into BQ, performed SQL queries to explore it, you may want to see the results in a more visual and interactive way using the transformed data. That’s where a dashboard comes in handy. (Yes, you guessed it right!)
A dashboard is a powerful tool that can display your web and app traffic data in various charts and graphs, making it easier to understand and analyze. You can use a dashboard to track your performance metrics, identify any problems or opportunities, and optimize your product features. A dashboard can also alert you to any issues that may affect your user experience, such as a bug that prevents them from completing a payment or accessing a certain page. You don’t want to miss these issues for too long, as they can cost you customers and revenue. You want to be able to detect and fix them as quickly and efficiently as possible.
One of the options that you have for creating a dashboard with your GA4 data is Google Looker Studio (GLS), a data visualization and exploration platform that works seamlessly with BQ. With GLS, you can connect to your BQ project and dataset, and then use SQL or a drag-and-drop interface to create charts and tables based on your queries. You can then customize and arrange your charts and tables into a dashboard that suits your needs and preferences. You can also share your dashboard with others or embed it into your website or app, so that you can always keep an eye on your data.
Create a new dashboard in Google Looker Studio and lets get started...
GA4 data in BQ and Looker Studio?
One of the benefits of creating a dashboard with GA4 data in BQ and Looker Studio is that you can easily monitor and identify issues with your web and app traffic data. For example, you can use your dashboard to spot anomalies, trends, outliers, or patterns that indicate a problem or an opportunity.
In this section, I will show you an example of how I used the dashboard we just created to identify and solve an issue with one of the payment steps in my website. The issue was causing a huge drop in conversions and revenue, and I was able to find it and fix it using the data.
IF YOU LIKE WHAT YOU READ, CONSIDER FOLLOWING ME ON MEDIUM! ??
The issue was related to the checkout funnel, which is the sequence of steps that a user takes to complete a purchase on the website or app. The checkout funnel consists of four steps:
To analyze the checkout funnel, I created a funnel chart in Looker Studio that shows the number of users who completed each step in a given period. The funnel chart also shows the conversion rate between each step, which is the percentage of users who moved from one step to the next.
There was a huge drop in conversions between the add payment info step and the purchase step. Only 10% of users who entered their payment information completed the purchase, compared to 80% between the begin checkout step and the add payment info step. This means that 90% of users who were ready to pay abandoned their carts at the last step.
This was a very alarming situation, as it indicated that there was something wrong with the purchase step that was preventing users from completing their transactions. To investigate the issue, I drilled down into the funnel chart and looked at the details of the users who dropped off at the purchase step.
I found out that most of the users who dropped off at the purchase step were using a specific payment method. In fact, 95% of users who used this method did not complete their purchase, compared to 15% of users who used other payment methods. This suggested that there was something wrong with the payment system's integration on the website that was causing errors or delays in processing the payments.
To confirm this hypothesis, I contacted the engineering team and asked them to check the logs and error reports for the payment method integration. They found out that there was a bug in the code that was causing a timeout error when users tried to pay with that payment method. The error message was not displayed properly on the website, so users did not know what was happening and gave up on their purchase. The engineering team fixed the bug and deployed the new code to the platform.
Since then, the funnel chart for the next few days after the issue was fixed clearly stated that things were back to normal. The conversion rate between the add payment info step and the purchase step improved significantly after the bug was fixed. It went from 10% in August to 75% in September, which is close to the average conversion rate for ecommerce websites. This means that more users who used the payment method were now able to complete their purchase without any issues.
The fix also had a positive impact on the overall revenue and user lifetime value. The revenue increased by 50% in September compared to August, and the user lifetime value increased by 25%. This shows that fixing the issue not only improved the checkout funnel, but also increased customer satisfaction and loyalty, which can go a long way for a start-up business.
Conclusion
In this article, you learned how to use GA4, BQ, and GLS to import, analyze, and visualize your web and app traffic data. You also learned how to use your dashboard to spot and fix problems with your web and app performance. These tools can help you understand your users better and improve your website or app for more conversions and revenue. You can also tailor your queries and dashboard to your specific needs and goals. But there are some things to remember.
First, the codes and visualizations I showed you are just one of the many SQL queries and dashboards I made for my company. Each dashboard has a different purpose, such as product traffic, device category, traffic source, page performance, feature comparison, and so on. When you have tons of data it is generally a good practice to create a ETL (or ELT) pipeline first and then move on to the EDA step. This decreases execution time and data processing costs. So this work can get complicated if you don’t have a clear goal. It’s important to map out your platform features before you write your SQL queries. This will make your work more efficient. I also enriched my GA4 data with other internal data sources, which may differ depending on your needs. If you want to track all the important features on your platform, this work is very important. Anyone who looks at the dashboard will have a clear picture of the overall health of your platform, which is vital for your business and your users. You may also need to explore other ideas depending on your industry.
Second, I used Google’s products mostly, but you can use other tools like PowerBI, Tableau, or Microsoft Fabric to do the same thing. Use the tool that you are comfortable with, the steps are similar to what I showed you. The main thing is to improve your KPI metrics, not the method - as long as it’s reliable!
Third, Google offers templates for creating GA-based dashboards in Looker Studio. You can check them out too. But the templates and the GA-interface may not be enough for your unique needs and requirements. You may need to customize them a lot. That’s what I taught you in this guide, so don’t worry!
I hope this article was useful and informative for you. If you have any questions or feedback, please leave a comment below. Thank you for reading! ??
Your guide on tracking Web and App Traffic Performance is a great resource for those looking to dive deep into data analysis! ?? Generative AI can streamline this process by quickly analyzing large datasets and generating insights, which can enhance the quality of your work and save you time. ?? I believe a conversation about how generative AI could integrate with your SQL expertise would be incredibly fruitful. ?? Let's book a call to explore how you can leverage AI to elevate your analytics game even further. Looking forward to discussing the possibilities with you! Christine
Cloud Architect
9 个月For #Looker beginners this can be useful: https://youtu.be/GoyNCD6hHkg?feature=shared
Data Consultant | Business Intelligence Consultant | BI Analyst Wunderman Thompson | Ex-Sr BI Analyst at 10 Minute School | Ex- Robi Axiata Ltd | Ex Data Analyst - Daraz ( Alibaba Group )
11 个月Nice one The article is really descriptive
Crafting Compelling Data Narratives | Google-Certified BI Analyst | Driving Growth, $3M+ ARR @ 10 Minute School
11 个月View this on Medium here: https://medium.com/@uttamdeb670/how-to-analyze-web-and-app-traffic-data-with-google-analytics-4-bigquery-and-looker-studio-66491b5470fd Thank you for reading! ??