The Importance of Hypothesis Generation vs. Hypothesis Testing: Case Study from Product School
Hypothesis generation is not the same as hypothesis testing - although both are necessary, and both can be rather challenging.

The Importance of Hypothesis Generation vs. Hypothesis Testing: Case Study from Product School

*This post contains links to educational resources for which I earn royalties if you actually purchase them.

I follow Product School on LinkedIn, and I strongly recommend you do the same if you are into data science, innovation, product development, and project management. They have many exciting free resources, with my favorite being that they regularly interview people who work at Big Tech companies and find out how they think and any professional or career advice. It’s fun to attend these LinkedIn live productions because the other people who show up are similarly oriented, and then we discuss the topic in the chat while interacting with the subject matter expert being interviewed. Try it out!

Talk on Hypothesis Testing

I unfortunately missed a talk I really wanted to see, which was, “A Guide to Hypothesis Testing” where they were interviewing former Amazon Senior Project Manager (PM), Kumara S. Raghavendra. Luckily, they recorded it, so I just watched it today. I was wondering how a PM would frame the idea of hypothesis testing, since PMs generally don’t actually do statistics. What do they think about when choosing hypotheses to test?

Something Very Confusing Happened

If you watch the video, you’ll see that the interview is not that long – about 15 minutes. In that time, the PM describes some interesting observations he has about cities, and he and the interviewer have a thoughtful discussion about impact on city life of the pandemic. One of the most interesting things he talked about was how cities exist for different “reasons” or “functions”, and gave these examples:

  1. Centers of trade and commerce – New York City
  2. Political capitals – Ottawa for Canada, and Washington, DC for the United States
  3. Religious focus – The Vatican

Cities can have different focuses depending upon their strategies. New York City is a center of trade and commerce, and so this city attracts people differently than a capitol such as Ottawa and Washington DC, or a religious city like the Vatican.

The PM and interviewer focused on the first kind – centers of trade and commerce. The PM sounded like a sociologist, presenting an idea called the “agglomeration effect”, where hubs like Silicon Valley attract many different groups – venture capitalists, startups, workers – so they can create a commerce ecosystem that is not possible in places where there is not such an effect.

But as this discussion is going on, what is curious is the word “hypothesis” and the word “testing” are never mentioned. They continue talking about the future of restaurants in cities while people in the chat are asking, “Are we at the right talk? When do we get to hypothesis testing?”

Then the talk ended.

What I Think Happened

So why did they say this was about “hypothesis testing”, then go on to talk about the future of cities without mentioning anything about hypotheses or testing? Most people in the chat seemed to think there was a mistake – like, the event had the wrong label. But the speaker was identified correctly. So what happened?

Here’s what I think happened. I think people – especially PMs – don’t really see the difference between hypothesis “generation” and hypothesis “testing”. Their discussion was entirely focused on hypothesis generation about what might happen to cities in the future. Although I feel the discussion was at an advanced intellectual level (as the PM seemed like a very thoughtful sociologist), it was still just talking about potential hypotheses one could test.

Why it Matters

The problem with transposing the concepts of “hypothesis generation” and “hypothesis testing” is that hypothesis generation is relatively easy. It’s basically coming up with testable ideas. One of the ideas in the discussion was the issue of restaurants being more interesting and diverse in large commerce hubs compared to small cities. One could test the hypothesis:

Among cities that are large centers for trade and commerce, lack of restaurant diversity has a net negative impact on a city’s economy.

But testing that hypothesis requires a whole study! First, you have to operationalize all of the concepts:

  1. Subpopulation: Define “large centers for trade and commerce” with metrics and other specifics – what is “large”? What’s a “center”? What is “trade and commerce”?
  2. Exposure (proposed “cause”): Define how you measure “restaurant diversity”. Define what constitutes the “lack” of it.
  3. Outcome: Define how to measure “city’s economy”. Define what would constitute a “net negative impact”.

Then, you have to make sure you have the data to cook up all the variables you need for your model. Eventually, you fit the model and you come up with the answer. It’s either, “Yes, it does have a net negative impact, and here’s how.” Or it is “No, it does not have a net negative impact, and here’s my evidence…” It’s “yes” or “no”. So you can see how much work answering even just one hypothesis is!

So Hypothesis Generation is Easy?

Well, sort of. Making any sort of hypothesis is easy – it’s like singing. If you can talk, you can sing – it’s just hard to sing well. So making good hypotheses is hard, and since it’s so much trouble to do a good job answering one, you want to prioritize your favorite hypothesis before embarking on any sort of hypothesis testing.

For example, my colleagues believed that adopting a ketogenic way of eating could positively impact lipedema symptoms among lipedema patients. I asked them: Why do you think that? Explain the mechanism to me. The result of me asking that question was this paper you can read.

Now that we did such a great job of hypothesis generation, we are able to move on to doing hypothesis testing about the ketogenic way of eating and lipedema. But this is the first time in the history of lipedema that someone has actually puzzled out this whole mechanism and written it down. Now, the next step is to test it – yet lipedema as a metabolic disorder was identified a long time ago. This is the first time someone has nailed down this hypothesis well enough to test it. Now we have a ton of yes/no experimentation we have to do, so we are just at the beginning.

This is a diagram from our lipedema paper that describes what we think is the mechanism behind the ketogenic way of eating and alleviation of lipedema symptoms. We laid out this hypothesis so we could test various parts of it.

I feel that this issue – of confusing hypothesis generation with hypothesis testing – is a very serious issue. I think it has held back the field of understanding lipedema, for example. That is why I strongly encourage anyone who wants to do study design with big data to take my LinkedIn Learning course series, “Designing Big Data Healthcare Studies” parts one and two, where I go over how to do the thing I was doing above – define a testable hypothesis, operationalize it, and then apply big data to it. Even though it says “healthcare studies” in the title, I literally used the same approach to define the hypothesis above about cities. By simply taking this course, you are empowering yourself with the knowledge about how hard it is to actually test a hypothesis. That way, if you are a PM like the presenter is, you can at least appreciate the need to be very detailed in the definition, and also, to prioritize hypotheses, because it takes so many resources to answer just one of them.

Want get better at collecting data to answer your hypotheses? Try Monika’s FREE online course in how to design a data collection protocol.

Himani Patidar

Application Support Analyst at Citi | Ex-Morgan Stanley | Smart India Hackathon 2019 Winner

3 年

Thank you so much Monika Wahi for tagging me, your articles are always awesome, please keep sharing!

Sagana Kupendrathas

Information Technology| Web Development | Machine Learning

3 年

Thank you Monika Wahi for tagging me to this Great Resource. As a beginner It's pleasure to read this content and learn more about Data Science.

Sudip Mutt

Management Expert with Finance & Marketing Experience

3 年

Thanks Monika for tagging me. I am new to data science, hence taking small steps

Sibuse M Ginindza

Technical Specialist: Statistician, Data Scientist/Analyst, Process Expert & Project Manager

3 年

Great stuff you've highlighted #MonikaWahi. I have always wondered how hypothesis testing goes with big data due to the of course big sample size, resulting statistical significance. Does that get covered in your course?

要查看或添加评论,请登录

Monika Wahi的更多文章

社区洞察

其他会员也浏览了