Google’s Brand Lift Missing Pieces
Juan Cristóbal Andrews Mujica
Director @ JetSMART Airlines | eCommerce | Marketing | Pricing | Data Science & Analytics | CRM
Google’s Lift Measurement is a tool which offers a convenient, easy to set-up and almost real-time approach to measure the effect that our campaigns have on some of the most strategic marketing metrics, specifically, ones related to brand management.
Having said this, there are some critical missing pieces of information that we, as marketers, would typically want to have, as provided in most studies.
About Lift Measurement
Lift Measurement is a tool within Google Ads Platform that can measure how your ads impact people’s perception of your brand. These measures are beneficial for us marketers since they allow us to evaluate how a specific marketing campaign performs in terms of well-known key marketing metrics such as Brand Awareness, Recall, and Consideration rather than the usual Digital Marketing metrics like clicks, impressions, frequency and reach.
Once enabled, the tool allows us to measure up to 3 different metrics, including the already mentioned plus Favorability and Purchase Intent. Additionally, like in traditional consumer tests, the tool requires that we provide alternative brands from the category, which typically includes direct competitors.
In terms of costs, even though it is advertised as free, in practice, Google asks for a minimum investment for the campaign being tested. In the case of YouTube Ads varies based on location but for videos anywhere else within their networks is USD 15,000.
How it works
Once the campaign is set up, metrics are selected, and brands from category (typically competitors) are defined; you’ll be given a preliminary minimum budget, which one can assume is a function of the statistical power.
Eventually, when campaigns are running, and ads are being served, the tool will randomly generate a test and control group. Specifically, groups will be generated based on the following criteria:
- Test group: People who have seen our ads
- Control group: People who were eligible by our campaign segmentation but did not saw our ads.
Surveys from both groups will be gathered, and different metrics such as Brand Lift will be computed and shown in the dashboard as soon a “detectable” result is found, which Google defines this as a Lift >2.0%.
Results
Once enough survey responses are collected, the tool will present its results in terms of expected values and their corresponding (confidence) intervals in the form of the following metrics:
- Baseline PRR: Percent of people in the control group who selected our brand as their preference amongst all other provided brands.
- Exposed PRR: Percent of people in the test group who selected our brand as their preference amongst all other provided brands.
- Absolute Brand Lift: Difference in PRR between Test and Control groups.
- Relative Brand Lift: Increase of Baseline PRR due to the tested campaign (Absolute Brand Lift / Baseline PRR).
- Total Survey Responses: Number of total surveys (Test + Control).
* PRR: Positive Response Rate
Other metrics engineered from the above, such as Cost Per Lifted User, Exposed to Ads, Not Exposed to Ads, are also included in the report.
Missing Information
To understand if a small observed lift might be due to just random chance, generate new intervals according to our preferences, and generally speaking, better communicating the results within our company, we would need information that’s currently not being provided in a straight forward way.
Additionally, this information might shed some light on a deeper understanding of why we need to invest the minimum budget that’s being asked, which in many cases is done for the sole purpose of this study.
In particular:
- Level of significance: Even though numeric confidence intervals are included for every metric, the actual confidence level in terms of percentages is not given nor is commonly addressed within the documentation. So far, I have been able to locate some official information that states that it is "usually around 90%" [ref] but whether this is one or two-tailed is unclear.
- P-Value: This would allow us to make a fast hypothesis check for different levels of confidence we might use.
Lastly, even though "Total Survey Responses" is informed both in summary and within the report itself, the actual sample size of test and control groups is not displayed by default in the summary page, nor it is available anywhere else within the report itself, limiting our ability to conduct our statistical tests for Age, Gender, Campaign, and Video related results.
Finding the missing information
To find the missing information, we need to perform a heuristic search of all possible combinations of Confidence Levels and Sample Sizes and then compare those results to check which has a better approximation to the results provided by Google.
To avoid a brute-forcing method, which is time and resource-consuming, a custom algorithm was built that will try to learn two parameters: Sample Sizes and PRR Interval as defined by the difference between expected Upper and Lower Absolute Brand Lift values.
Learning Algorithm
A custom optimization algorithm was built, which will allow us to obtain a good approximation of:
- Level of significance.
- The sample size of both test and control groups.
- Standard Error.
- p-Value
This information will allow us to conduct our statistical tests, a stepping stone in generating confidence intervals that suit our needs.
Algorithm in use
(Inputs provided by Google Brand Lift Results)
Confidence Level Calculation
We will try to obtain the confidence level used by Google in the report generated by the tool.
Note: We have two outputs (Predicted and Most Likely). This is because Google provides us approximate (rounded) values, which affects this calculation. We will, later on, use only the “Most Likely” value, which is an approximation to commonly used intervals.
Sample Sizes, Standard Error and p-Values
Values are calculated using the “Most likely Type I Error” calculated above. Note that these values will be the best approximation possible to given parameters.
As seen in the results above, we obtained a p-Value of 0.054858, so as we will later confirm, we can anticipate results will not be significant for intervals of 95% or more, though it is close at 95%.
Found the missing pieces
We can now proceed to define values for our custom intervals.
Note: these values are just (good) approximations of real ones. If there is a chance you already have real sample sizes or additional information, try using those and compare the results.
Original Brand Lift Values
Now we can calculate the original report with all missing information included. This should be a good approximation to the actual values provided by Google.
Confidence Interval at 90%
Confidence Interval at 95%
As predicted above, at this point, the expected interval includes the chance of lift being zero which is consistent with our p-value found before.
Confidence Interval at 99%
Final thoughts
As seen by the report, at a 95% confidence interval, we cannot reject the null hypothesis; therefore, we do not observe a significant lift in Positive Response Rate if we use a confidence level of 95%. Contrary to this, we initially concluded there was a significant increment by observing the default output from Lift Measurement Tool.
Opinions may vary as to which confidence interval is acceptable for a Brand Lift Study or any other Marketing Study, which will most likely depend on our goal.
Marketers will most likely be concerned about the actual effect the campaign had in lift values rather than the sole determination if this study is significant or not. We would like to know if we should continue investing in a specific campaign since we will most likely create budget scenarios to achieve a specific goal of lifted users. For this, we need a (known) level of certainty that can create budget scenarios, including Costs per lifted users and New Lifted Users, amongst others.
In most studies, this situation is usually addressed by providing the p-value, which allows the reader to define its tolerance on accepting or rejecting the hypothesis.
Feedback
What is your thought on this? What levels of significance do you use in such marketing studies? Is it something you consider relevant?
Marketing Manager | Marketing Digital | eCommerce Lead | Performance Manager | Sub Gerente de Marketing
4 年Un artículo diferenciador, profundo y con contenido matemático dándole un enfoque distinto a los clásicos artículos de la red. ?Felicitaciones Juan Cristóbal!
Data Science @ Stripe | UC Berkeley
4 年Buenísimo articulo!
Co-founder & CEO en Retargeting / Full Stack Marketer / EPA / General Ideas Expert
4 年El placer de la ingeniería inversa ;) Dos cuestiones que me parecen relevantes; la primera, notable la invitación a repensar KPIs como Preferencia, Brand Awareness, Recall que a ratos parecen ser inmedibles y por ende escasamente accionables. La segunda, sabrán que los "hackiaste"?
Digital Marketer & Campaign Manager Expertis relacionamiento y fidelización B2C Amante de los datos y su análisis
4 年Buenísimo!
Head of e-Commerce and Digital Sales | FEMSA Salud | MBA UC | Estrategia
4 年Excelente artículo Juan Cristóbal, saludos!