Tech & Data Diary - Entry #016: Calculating Propensity For Audience Segmentation in BigQuery
In digital marketing, understanding customer behavior is crucial for targeting the right audience with the right message. Propensity modeling is a powerful technique that helps marketers predict the likelihood of a user taking a specific action, such as making a purchase or clicking on an ad. By calculating propensity scores, you can create targeted audiences based on their likelihood to convert, leading to more efficient and effective marketing campaigns.
In this article, we'll explore how to calculate propensity using BigQuery SQL, focusing on how to build audiences from Google Analytics 4 (GA4) user IDs at different levels of propensity. We'll walk through an example that applies this concept to a digital marketing campaign, allowing you to segment your audience based on their likelihood to engage with your brand.
Understanding Propensity Modeling
Propensity modeling is a statistical technique used to predict the probability of a specific event occurring. In digital marketing, this often means predicting whether a user will convert, such as making a purchase or signing up for a newsletter. Propensity scores are typically calculated using a variety of user behavior data, including past interactions, demographic information, and other relevant factors.
By assigning a propensity score to each user, marketers can segment their audience into different groups, such as high, medium, and low propensity. This allows for more personalized marketing efforts, targeting high-propensity users with conversion-focused messages while nurturing lower-propensity users with awareness or consideration content.
Setting Up the Data in BigQuery
To calculate propensity in BigQuery, you'll need access to your GA4 data. GA4 data can be exported to BigQuery, where it can be queried using SQL. The data typically includes user interactions such as page views, events, and transactions, all tied to a unique user ID.
For this example, let's assume we have the following table structure:
We'll calculate propensity scores based on user interactions and transaction history.
Step 1: Preparing the Data
The first step is to aggregate the data to create features that can be used to calculate propensity. These features might include the number of page views, the number of transactions, the time since the last transaction, and more.
Here’s an example SQL query to aggregate user interaction data:
WITH user_features AS ( SELECT user_id, COUNTIF(event_name = 'page_view') AS page_view_count, COUNTIF(event_name = 'purchase') AS purchase_count, MAX(event_timestamp) AS last_event_time, CURRENT_TIMESTAMP() AS current_time, COUNTIF(event_name = 'add_to_cart') AS add_to_cart_count, COUNTIF(event_name = 'begin_checkout') AS begin_checkout_count FROM project.dataset.events GROUP BY user_id ) SELECT user_id, page_view_count, purchase_count, add_to_cart_count, begin_checkout_count, TIMESTAMP_DIFF(current_time, last_event_time, SECOND) AS time_since_last_event FROM user_features;
This query creates a set of features for each user, including the number of page views, purchases, add-to-cart events, and the time since their last event. These features will be used to calculate the propensity score.
领英推荐
Step 2: Calculating the Propensity Score
Once the data is aggregated, the next step is to calculate the propensity score. In this example, we'll use a logistic regression model to calculate the propensity score based on the features we generated. While the actual modeling process typically requires a machine learning framework like TensorFlow, you can implement a simplified version directly in SQL using weights derived from a pre-trained model.
Here’s an example query to calculate propensity scores:
WITH user_features AS ( -- (Same as above query) ), propensity_scores AS ( SELECT user_id, 1 / (1 + EXP(-(0.5 * page_view_count + 0.7 * purchase_count + 0.4 * add_to_cart_count - 0.3 * time_since_last_event / 3600))) AS propensity_score FROM user_features ) SELECT user_id, propensity_score, CASE WHEN propensity_score >= 0.8 THEN 'High' WHEN propensity_score >= 0.5 THEN 'Medium' ELSE 'Low' END AS propensity_level FROM propensity_scores;
In this query, we apply a logistic function to calculate a propensity score for each user. The coefficients (0.5, 0.7, 0.4, -0.3) are example weights that would typically come from a trained model. The score is then used to categorize users into high, medium, or low propensity groups.
Step 3: Building Audiences Based on Propensity
With the propensity scores calculated, you can now segment your audience into different groups based on their propensity levels. These segments can be exported to your marketing platforms, such as Google Ads or Facebook Ads, for targeted campaigns.
Here’s how you might create segments:
SELECT user_id FROM propensity_scores WHERE propensity_score >= 0.8;
This query selects users with a high propensity score, which can then be used to create a high-propensity audience segment.
Step 4: Applying Propensity Scores in a Marketing Campaign
Once you’ve segmented your audience, you can tailor your marketing campaigns based on propensity levels. For example:
By applying propensity modeling, you can optimize your marketing efforts, ensuring that each user receives the most relevant messaging based on their likelihood to convert.
Conclusion
Calculating propensity using BigQuery SQL allows you to leverage your GA4 data to create highly targeted marketing campaigns. By segmenting your audience based on their propensity scores, you can tailor your messaging to maximize engagement and conversions. This approach not only improves campaign efficiency but also enhances the user experience by delivering more personalized content.
With the example provided, you can start implementing propensity modeling in your digital marketing campaigns, driving better results and more informed decision-making.
MarTech| Digital Portfolio| State Bank of India
3 个月How does the propensity model work for non-transactors using GA4 data? Or for visitors with no transaction history at all?