Growth leaders roundtable on Testing.
Manfredi Sassoli de Bianchi
VP Growth - Delivering profitability and growth for B2C Tech companies and Marketplaces: performance marketing, analytics, growth modelling, experimentation and international operations.
Last week, in our Growth Leaders’ virtual aperitivo, we had a great, (and informal), chat on the topic of Testing. The group was made of myself and a four exceptional Growth and marketing experts: Oren Greenberg, Yara Paoli, Ian Howie and Gianluca Binelli. Katia Damer also joined us, she is the founder and CEO of her company, and she has a Ph.D on social psychology – so she knows a thing or two about testing.
In February I had read an extended feature on the Harvard Business Review magazine on testing and experimentation. The feature had a strong focus on digital ventures, including a piece on Booking.com as a case study, (link here), which talks about constant testing in a completely democratised way: one where anyone can run tests.
I always believed that speed of learning is the single most important metric to optimise to – so running 1,000s of test sounds like the optimal solution. Unfortunately it’s not that simple, (although I have come across a firm whose man KPI was numbers of tests run).
LEARNING NOT TESTING
What we all agreed on was that testing designed in order to increase conversion rate or revenue is not particularly beneficial. This sounds counterintuitive, but there is a strong logic behind it.
Any hypothesis worth testing can either turn out to be true or false. For any working website, the likelihood is that the test outcome will be a false, so in order to encourage testing one must be able to embrace failure. So how does one embrace failure?
The key here is to run experiments that allow you to learn, if a company is constantly learning about its customers it will inevitably move forward in the medium and long term.
IT’S ABOUT THE QUESTION
Before running a test one must have a clear hypothesis. This should follow a period of observation, (or analysis). The Hypothesis should then state which variable should be changed and what impact on outcome is expected and the reason why.
Coming out with the right questions / hypothesis will over time always give you a better understanding of your market, the outcome of the test is secondary.
CULTURE AND BOLDNESS
The hardest part in establishing a culture of testing is balancing the friction between embracing failure and accountability.
If everyone can test anything and failing is accepted, how is the team motivated in delivering wins? (failure is ok as part of a process that brings long term success).
People need to be accountable for the tests they run, each test needs to move the organisation forward in terms of learning. If that happens then the test should be seen as a success, even if the test is a failure. On the other hand, a test that improves conversion rate but doesn’t deliver learning should not be incentivised.
While studying innovation I came across the concept of celebrating failure. During our conversation it came up that in many of the largest and most successful tech companies people have worked in there were prizes for the best failure of the month.
Ian stressed how failing fast is absolutely accepted in the US. Particularly in California, the ability to fail fast is appreciated. Much less so in the UK, even less in Europe.
Accepting failure to accelerate learning has one distinct effect: it encourages bold ideas.
Bold tests not only give you big insights, but they tend to have a big impact on performance – good or bad. This impact means that it’s likely there will be a big delta between the two variables, which in turn leads to statistically significant results quickly.
STATISTICS, TIME AND RIGOUR
A testing culture is also a culture of optimisation. Often teams invest a lot of time thinking about how to maximise outcome and little about how to maximise speed.
Before starting each test there should be a prioritisation process, scoring test on predicted impact, cost and ease of implementation and time of learning (the first three are part of the ICE methodology, but T (time) is often forgotten).
Planning for T also requires agreeing on the confidence level needed to declare the end of a test. Is the team happy with 80% , 95% or 99% confidence? Either way test should be prioritised depending also on the time it takes for them to show some results. Testing a small button on a low traffic page will require a lot of time to test as traffic volume will be low and so will the delta in performance.
Once the confidence level is agreed, the team should also agree a time limit – if after X number of days or weeks no variable has shown a superior performance the test in considered null – and the team moves on to the next test.
CHEATING
What if one requires quick wins?
Testing is expensive and one may need to make a business case for it and that may require showing a win, or a few wins.
Here enters the concept of exploration vs validation. Exploration can be carried out online. Historical data analysis can create insight, which can then be tested directly with users with qualitative research, through very specific but open ended questions, or even with a Wizard of Oz prototype. This will then give the team intel to design tests for validation, rather than for exploration.
Exploration tests will likely deliver a false, while validation is likely to deliver a true result.
This approach may be relatively time intensive, but it’s a good way to start when the website/product doesn’t yet have the infrastructure for testing at scale and excessive dev time is required to run a test.
MULTIVARIATE TESTING AND COMPETITIVE ADVANTAGE
What about multivariate testing? Here the matter gets complicated.
There is little doubt that multivariate testing can speed up the speed of learning, the challenge is its difficulty in implementation, particularly at scale. If our goal is learning, rather than optimising, learning across three variables simultaneously will prove tricky.
Testing multiple variables can also lead to confounding, when change in one variable impacts the other. This can partly be solved through the design of orthogonal tests (link here for more info). Things can be further accelerated using a ML technique called multi armed bandit (for more info link here).
Yara in our group had particularly deep insight on how experimentation is carried out on a couple of big tech companies, famous for their testing capabilities: Booking.com and Netflix. Both firm have developed an internal software to facilitate the process, democratise it, improve rigour and maximise knowledge sharing internationally.
Of course most firms can’t invest millions to build an in-house testing solution, their traffic volume don’t justify it: this showcases the power of a data competitive advantage. The higher traffic volume, will allow for faster and more learning, which will deliver a higher conversion rate, driving more revenue, that can be invested in ad-spend to deliver even more traffic. (When a 5% uplift in conversion rate delivers £10 mils it’s easy to make a business case to invest £1 mil to develop an in-house testing platform).
The diagram above shows the data flywheel, or how the volume of data can deliver a competitive advantage, (or moat), that compounds over time, (not to be confused with a data network effect).
Designing a growth loop is extremely challenging, in fact I’d argue that for certain products it’s impossible. A data flywheel on the other hand is a competitive advantage that is accessible to many digital firms who can develop the right learning culture.
Most firms can’t run 1,000s of tests a year like Amazon or Booking.com, sometimes traffic only allows for 10 significant tests over 12 months, so how does one fill the gap? Thinking very very carefully about what to test based on user psychology. One great hypothesis can unlock huge value.
Sales & Marketing Growth Leader
4 年There are some great points such as company culture and mindset around failure. I think you're spot on with the learning culture in the UK and Europe. Marketing leaders need to deliver insights, reach and growth, but not every test will be successful. So managing these expectations across the business is key. The most challenging cultural aspect I have encountered is the misalignment with KPIs. Some staff are bonused on a number of tests and if those tests don't have a commercial impact, then it's not benefiting those staff that do have KPIs based on commercial growth. Therefore, if failure is to be celebrated then this must go hand in hand with the need for positive tests. Failure alone is a failure, are you meaningfully learning if tests continue to fail, and those insights are not being used to create a positive business impact? When I read "increase conversion rate or revenue is not particularly beneficial" - it made me think, how many people agree with this? Especially shareholders, execs and those KPI'd and bonused with commercial impact. Personally, if my team were to run a test that grew the business and the data confirmed that the test was the stimulus for that growth, then I would see this as a success. Why wouldn't commercial growth that can be attributed to effective testing be beneficial? Especially when you consider the ICE methodology that you reference, its first focus is on impact. This would be business and commercial impact rather than just consumer insights, I'm assuming. Sounds like a fantastic forum.
VP Growth | ex-BCG Digital Ventures | Reforge Alumni
4 年Gutted I missed this call. Interesting seeing “time” or “length” as an overlooked variable to ICE-ing and experiment planning. Adding thoughts to your conversation, I’d also factor the type of experiment you’re conducting, the qty or % of users you’re aiming to reach to support your hypothesis and in cases it may have to sync to external factors such as stakeholder sprints. I’d also recommend building a growth experiment wiki consisting of experiment logs and outcome logs (success, failure, inconclusive) and managing your experiments’ frequency and -rate m/o/m as much as the learnings. Looking forward joining you guys at the next one.
CRO & Growth for DTC brands | ex-CXL
4 年Gotta join you guys next time!!!