Why should you have a Centralized Experimentation Platform?
Linked-In Leanring

Why should you have a Centralized Experimentation Platform?

"Why, we should have a Centralized Experimentation Platform. 

Have you ever asked yourself this question, if you're currently doing Testing and experimenting, why you should build/buy an experimentation platform. Why you should have it in the first place, we already doing testing anyway and we're good;)

The answer to this question shouldn't necessarily be a pure technological view (Talking about the platform features, infrastructure, etc.), but more of it as a business view as well. Below things that you should keep in mind as business intuitive arguments when answering such a question. Let's talk broadly today on the business side and later in another article will talk on the technological side and things like "build vs Buy" decision, etc.

After fully embedding experimentation as a cultural change in the organization and embracing experimentation as a necessary/crucial step in the innovation process to determine the success of the business across its products and services. Having a centralized place that acts as a "Single Source of Truth" for anyone in the organization when it comes to experimentation is tremendously beneficial to such a cultural thing. 

No alt text provided for this image

Storing things such as -but, of course, not limited to-

  • What kind of experiments ran in the past, currently running, or planning to be run soon?
  • What kind of business ideas we were (are) testing and from which functions, departments, and what kind of impact it has/had on the business?
  • Meta-data per each Experiment/Test such as:

- How long it ran?

- Business Description, the hypothesis, and the idea behind it. E.g., if you're testing a new UX design of placing your main CTA in a different position, attach the UX design with your experiment and any useful information behind developing such a hypothesis to test in the first place. It could be a front-end analysis of users' sessions or heat-maps or a UX research your team did, etc of things that led you to make such a change following test it.

E.g. If you're testing a Cashback offer on your loyal customers, you should attach your business case/Analyses/the Campaign mechanics with your experiment, why? Continue reading to tell you the reason 

  • How much the Experiment had as an impact on key business metrics as well as overall on the business.
  • What decisions were made based on them, and why?
  • All of this among other different things that differ between companies or businesses.

Cool, but still why from a business point of view, should I have it? In a nutshell, the answer is, "It Solidifies the Experimentation Culture" more than you imagine .

Having a single source of truth on what you have done in the past in terms of experimentation, Testing, and related information will help you prove the value/importance of experimentation across the organization as a critical step for innovation and solidify this culture. This will be across different dimensions, for example, below are a couple of key and noticeable ones.

3. How experimentation has been contributing to improving business performance, growth, and driving that? How much of that can we attribute to experiments?

No alt text provided for this image

In-business experimentation doesn't always mean that your experiments will be a big boom every time to see the value out of such a culture. It's the accumulation/incrementally over time that matters. Sometimes, small things matter, the Inch-By-Inch improvements per each experiment that are accumulated over time or added together, are what you need to see to assess the impact of such a culture. Having such a centralized place with all of this information will help you quantify that over time.

2. Learning By Examples

No alt text provided for this image

Proving success with experimentation isn't just about telling what numbers we achieved. It's a broader set of goals and it's about showing with example a story of success with learnings and how we as an organization harnessed the value of these learnings over time. Telling that your +100 experiments a year drove more than X MM as profit isn't truly fully meaningful unless you combine that with the key learnings out of these as well. What learnings across functions/department that will help the team formulate better hypotheses, ship better products, services, or design better offers. How these learnings will help the team ship testable ideas in the future that will help the business grow and maximize the innovation even further beyond just numbers.

keep in mind, we experiment to causally learn about the consequences of our actions and further quantify that to inform our decisions

As discussed harnessing the value of this culture doesn't necessarily be monetary value each time. It's learnings the matters  and of course, these learnings will pay off and transformed into monetary value & ROI later. However, it's not the first thing that you gain out of each experiment your run. That's why we say experimentation helps you fail fast, and act quickly & innovatively. So, failing fast no doesn't mean losing money because you're gaining more value than money, IT'S THE CAUSAL LEARNINGS.

Showing examples of key impactful experiments/tests with big or surprising results or positively/negatively impacted your metrics, business performance. will be a true learning experience among different teams. Not just the one owning the Test/Experiments, showing which department/function/team ran the most experiments to inform their decisions. Or learned the most from experimentation will help people realize that experimentation is the rescue boat that will let you sail (Aka. make informed decisions) in an uncertain and probabilistic world.

No alt text provided for this image

It will encourage your team for further testing and embedding testing as an inherent part of the decision, starting from simple/small business ideas to the bigger ones. For example, in 2019, I had a client in Europe, and they have this kind of "Newsletter", that they constantly publish across the organization and across all the levels (it doesn't matter your title, without any exclusion). Showing what Tests had run in the past, what were the results, learnings out of them, which team was owning that, etc. In a nice looking & digestible report, the value was astonishing, employees were commenting on the results and providing feedback. I remember once a new joiner in Marketing provided a simple comment on one of the experiments, that a whole team of Analysts, Software Engineers, etc didn't even think of and it improved the learnings later once they incorporated that.

For example, This is what Jon Noronha, director of product management at Optimizely (A key digital Testing Platform) is saying. “When you start to embrace this way of doing product development, it changes your mindset,” Noronha says. “We don’t just have a product road map and a couple of experiment ideas on the side. Soon we start to think of every idea we have for improving our products, whether you’re a designer or an engineer or a product managerIt just becomes another hypothesis. They’re all on equal footing, ready to be tested.”

Let's face it, we as humans poor at assessing new ideas, and Experimentation is the key culturally to let each one in the organization know that their ideas are considered and heard. For example:

In 2012 a Microsoft employee working on Bing had an idea about changing the way the search engine displayed ad headlines. Developing it wouldn’t require much effort—just a few days of an engineer’s time—but it was one of hundreds of ideas proposed, and the program managers deemed it a low priority So it languished for more than six months, until an engineer, who saw that the cost of writing the code for it would be small, launched a simple online controlled experiment—an A/B test—to assess its impact. Within hours the new headline variation was producing abnormally high revenue. An analysis showed that the change had increased revenue by an astonishing 12%—which on an annual basis would come to more than $100 million in the United States alone—without hurting key user.

                          -------------------------------------------------------------------------

Greg Linden that was was explicitly forbidden to work on the "Shopping cart recommendations" and went a head anyway to test his idea to become later the foundation for every e-commernce website that you're now seeing. Amazon's managers thoughts it's a bad idea and would distract usres from completing their purchase transaction. At this point, He was told he was forbidden to work on this any further. He was told Amazon was not ready to launch this feature. However, later this drive the billions of dollars that YOU contributed to them by using this feature.
I think building this culture is the key to innovation. Creativity must flow from everywhere. Whether you are a summer intern or the CTO, any good idea must be able to seek an objective test, preferably a test that exposes the idea to real customers. Everyone must be able to experiment, learn, and iterate. Position, obedience, and tradition should hold no power. For innovation to flourish, measurement must rule. Greg Linden at Amazon (back-then), the true foundation behind recommendation engines.

To inspire innovation on your team, give employees the freedom to dream up new ideas, and implement them.

This doesn't mean you always need to go against others, you need to be thoughtful about your idea and do your homework first before battling like the above examples. Sometimes your idea is honestly horrible, but we let the experiment put that to bed!

3. Upskilling and Embracing Experimentation Best Practice

No alt text provided for this image

Not everyone in your organization will follow exactly the best practice, whatever they are, or the playbooks you develop for that. Why? Simply because it does work with everyone or for every use case, especially when more and more folks start to experiment. Also, again you're building a culture but at the same time, you don't expect everyone to be a Data Scientist/Statistician to check whether the Test is enough powered to detect an effect or whether we have a suitable experimental design or bias in the randomization/assignment or not?

Having a centralized experimentation platform will help you perform like "Meta-Analysis" on your previous experiments to understand the learnings better and further divide that by Teams to pass the learnings as well across teams for future experiments. This will raise accountability and further help you gain insights into what you need to automate to improve the experimentation process. How you can make the process more intuitive and easier to speed up the experimentation process. But at the same time robust and have guardrails in place to guide them throughout the design/measurement process.

For example, LinkedIn built a feature based on a statistical method that is integrated into the process of running every experiment to automatically recommend ramp decisions for users. After they analyzed many experiments in the past and discovered inefficiencies and risks.

This can be helpful also, for training new users on the process, for example, imagine new joiners to your team or across the organization. Having a centralized experimentation platform. Will be like a catalog of what worked, what didn't work in the past, and what kind of ideas and decisions were made following the results. This is truly valuable to inform them while developing more ideas and hypotheses. In order to avoid repeating mistakes and inspire them with new (better & Innovative) ideas based on what has done in the past.

E.g. If we ran X offer last year but were affected by the extreme macro-environmental factor (Say, Corona), this doesn't mean that it's a bad idea to consider, and it worth trying again in 2021!

No alt text provided for this image

Don't expect the employees to remember what offers, actions, or changes they implemented in the past fully. Having such a centralized place will help them make a Better Choice next time on what, why, how should we test. And this place is easily accessible to everyone and not restricted to a specific team/department if you would like to have this culture. For example, see below the screenshot from Netflix blog about the same thinking I outlined:

No alt text provided for this image

DON'T RELY A LOT ON THE USER TO WORK BY THE BEST PRACTISE, SAVE THEM TIME AND AUTOMATE THESE BEST PRACTISE FOR EFFICIENCY. If you want to apply Experimentation at Scale in the organization, this what you need to do, let them worry about being innovative and generate new ideas/hypotheses, better ones and let the platform complement the technicality 

No alt text provided for this image

4. Improving the Experimentation Process itself

Having a centralized place will arm you not only to improve the outcome on the business in terms of profit or monetary value. But further by utilizing the learnings across many past experiments to improve the Experimentation process itself. This is through, for instance, performing a meta-analysis on your past experiments, so, new patterns and learnings will emerge that can guide you to better ideas, process improvements, or even thinking! Also, will help you be more predictive with a probabilistic view, such as predict the impact of your idea or change based on the patterns you're observing in the past. Identifying which type of experiments was successful in driving X metric, which kind of change, action, or offer, etc users are most likely to engage or be impactful?

No alt text provided for this image

More importantly, it will help you better understand your metrics. Metrics in experimentation (Holding other things constant) are so sensitive and depend on your field, the experiment, and the changes you're making among other things. When designing your experiment and selecting your north start metric you need to ask yourself one question "Can this be meaningfully measured during the experiment?"

You can look through your past experiments and understand how various metrics are performing across different types of experiments/Customers/Segments/Actions/Hypothesis, etc. To better build the intuition around which is suitable for what and how to leverage them? for example, Can we use X Metric for that experiment or we can't collect data for it, can we use a proxy for it then? What type of experiment moves what type of metric statistically and what is not? Studying your past experiments and your metrics performance across them helps you identify which metric is useful for long-term effect vs short effects.

Furthermore, you can uncover how your metrics are related or correlated with each other, for example, a customer visiting the store multiple time a day tends to increase their sales but not necessarily increase their ticket size per transaction! Do you have specific metrics that can be easily impacted and provide you with a signal for changes to other metrics? Analyzing your past experiments through a centralized place will help you uncover and reveal such relationships and patterns. 


Now majority of experiments platform when designing tests "use Predetermined Experimentation Duration Approach" one of the components we use for in that approach is the variance aka. the metric movement in the historical data assuming that these can offer reasonable knowledge on the future. However, for customers or humans tests it's an evolving world and such an assumption will not hold, that's why the Bayesian approach is gaining popularity in the field. Tracking all of that through a centralized platform will help you assess which approach is reliable to build a better estimation of your future variance.

For example, last year a team of researchers at Microsoft develop a new framework to improve innovation productivity and conduct testing. Through analyzing thousands of experiments that ran on the Microsoft experimentation platform.
Another example, is “Winner’s Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments.” by Minyong R Lee & Milan Shen, in the KDD 2018 conference for Data mining. They addreess that when a group of experiments is conducted, usually the ones with significant successful results are chosen to be launched into the product or to be implemented. In there ownwords and I quote "We are interested in learning the aggregated impact of the launched features. In this paper, we investigate a statistical selection bias in this process and propose a correction method of getting an unbiased estimator. Moreover, we give an implementation example at Airbnb's ERF platform (Experiment Reporting Framework) and discuss the best practices to account for this bias"


The bottom line:

No alt text provided for this image

Working in the experimentation field, it's not about just the tool or specific platform you're building/using/Selling. However, and more importantly, it's a cultural change. As a result, when you're demoing the platform for the first time or selling it, for example, think of voicing over the above tone. Yes, we're selling an experimentation platform, but more importantly again, we're helping you build a culture of innovation with this tool and this is what we care about (In my humble point of view ) and this is the impact we're looking for. 

Echoing well thoughtful sentences from Netflix

No alt text provided for this image

I remember once a client said to us and I quote, "You guys changed the way we think about our business and customers, Thank you!". This impact worth millions of dollars to me :)

So, to succeed in building this culture and harness the power of rapid innovation. First thing, eliminate the "Caesar" image/concept and encourage employees to come up with ideas. Trust me they will do, if they see that they will be heard and their ideas will be Tested Second, give them the right tool in place to solidify that, and this why you need a platform or a tool in the first place so, they can stand on other's shoulders when developing these ideas!

Also, what we're discussing here is simply the answer to a question of "Why should I need a platform?", a question that has two dimensions; a business one and a technological one. This question is a bit different from "Why we should Test in the first place with/without a platform?". 

References (including to the above Hyperlinks):

  1. Building a Culture of Experimentation
  2. From Infrastructure to Culture: A/B Testing ... - LinkedIn
  3. Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society 
  4. Reimagining Experimentation Analysis at Netflix
  5. Improving Experimentation Efficiency at Netflix with Meta Analysis and Optimal Stopping
  6. Under the Hood of Uber's Experimentation Platform
  7. Understanding Experimentation Platforms by Adil Aijaz, Trevor Stuart, Henry Jewkes –
  8. LinkedIn’s experimentation platform
  9. Experimentation @ Intuit — Part 1 Culture
  10. Litmus: GOJEK’s Own Experimentation Platform
  11. Building Grab’s Experimentation Platform
  12. Making the LinkedIn experimentation engine 20x faster
  13. 8 Things to Do Before You Run a Business Experiment
  14. Yu Guo: Scaling experimentation at Airbnb: Platform, Process, and People
  15. Trustworthy Online Controlled Experiments
  16. Scaling Airbnb's Experimentation Platform
  17. The Surprising Power of Online Experiments
  18. Experimentation works
  19. A/B Testing with Fat Tails
  20. Microsoft ExP Experimentation Platform
  21. How continuous product testing tripled Microsoft Bing’s U.S. market share



要查看或添加评论,请登录

Tarek ElGohary的更多文章

  • Prediction & Causal Reasoning

    Prediction & Causal Reasoning

    In the business world, making the wrong decision is costly and to increase innovation, doesn't only mean you need to…

  • A conversational way to talk about Business Intelligence

    A conversational way to talk about Business Intelligence

    Business is run with the ultimate goal of doing something productive to serve someone’s needs and thus earn a living…

社区洞察

其他会员也浏览了