How to Achieve Last-Click UTM Attribution in Google Analytics 4 with BigQuery
Our client recently encountered significant challenges using the data from Google Analytics 4 (GA4) due to differences in how it attributes user activity to marketing efforts. Initially, they attempted to extract UTM parameters from GA4 data exported to Google BigQuery. However, unlike Universal Analytics (UA), where traffic source data is readily available for each session, GA4’s event-based tracking model led to gaps in capturing UTM parameters, creating inconsistencies in their marketing attribution reports.
Challenges with UTM Attribution in GA4 vs. UA
In Universal Analytics, each session was tied to a single traffic source, and any change in the source triggered a new session. This ensured that every session had reliable UTM information for medium, source, and campaign parameters. However, GA4’s event-based model allows sessions to persist even when the traffic source changes, and the UTM parameters are recorded only at the event level. As a result, if the first event in a session doesn’t capture UTM data, sessions may lack essential source, medium, and campaign details. This discrepancy impacts year-over-year comparisons and the overall interpretation of traffic attribution in BigQuery.
Technical Solution for Missing UTM Data in GA4
To address this issue, Napkyn developed a robust solution leveraging SQL queries in BigQuery to capture and reconstruct missing UTM data:
Event Data Extraction and Aggregation
An initial SQL query extracts key fields like session_id, user_pseudo_id, and session_start, along with available UTM parameters (utm_source, utm_medium, utm_campaign) from the GA4 event stream.
This query isolates sessions where UTM parameters are null, indicating missing traffic source data.
Applying Last Non-Direct Attribution in BigQuery
A second query was implemented using window functions to backfill missing UTM values. By applying the Last Non-Direct Attribution model, we identified the last valid traffic source from prior sessions of the same user and assigned those UTM values to sessions with missing traffic source data.
The solution utilized window functions like LAST_VALUE(), PARTITION BY, and ORDER BY to track traffic source changes across sessions.
领英推荐
Data Processing and Transformation
The query logic was designed to handle various edge cases such as incomplete session data, multi-session users, and long lookback windows (e.g., 30-day or 90-day). We optimized the query’s performance in BigQuery to balance data accuracy with computational efficiency.
Validation and Improved Attribution Reporting
We conducted a thorough validation by comparing the outputs from BigQuery against the client’s GA4 reports. The results showed full alignment, with corrected UTM values now present in sessions where they were initially missing, significantly improving the fidelity of their traffic attribution reports.
Outcomes and Improved Attribution Reporting
The implemented solution delivered a complete and accurate view of the client's traffic sources, correcting previously missing UTM data and enabling reliable marketing attribution analysis in GA4. By leveraging the Last Non-Direct Click attribution model in BigQuery, the client now has a scalable, automated solution that aligns with their existing strategies and facilitates more informed decision-making.
How to Resolve UTM Attribution Discrepancies in GA4 with BigQuery
Conclusion
This case illustrates the power of customizing your approach using the raw data available in BigQuery to resolve discrepancies between Universal Analytics and GA4’s data models. The adoption of event-based tracking in GA4 requires tailored solutions to ensure continuity in traffic source attribution, where your business needs it. By implementing custom SQL logic to specify an attribution model, it’s possible to bridge data gaps and drive more accurate marketing insights.
Need help with GA4 attribution? Contact us for a consultation on optimizing your data strategy.
Read the full blog -> https://bit.ly/48qRoue