Zillow and the problem of 'average accuracy'?

Zillow and the problem of 'average accuracy'

A live, cautionary tale for anyone who thinks that pharma will be fixed with the simple application of better data, simulation/ ML/ AI and ‘disruption’… I was so struck by?this Twitter thread?by?Mark Tenenholtz, where he covers the disastrous Zillow project.?

What is clear is that Zillow had better data on its market than pharma does. The process of buying and selling is less complex than pharmacology and biology (although as I am between houses right now, that’s hard to imagine!). They did it right, in terms of spending years testing their model against actual market data.?

But they still got it wrong. As Mark writes, ‘average accuracy metrics’ hide big decision errors and privileged information. “The regression model was totally fine. Their decision analysis was not.”

Now, please, think about how your eNPVs corrupt your decision process… And, perhaps worse, the source of your PTS/ PRS algorithms, which are like the necrotic core of those eNPVs…

Read on:

Zillow’s home buying business lost them $500,000,000, 25% of their stock value, and 25% of their workforce.

How did this happen to a company with so much data on housing prices?

Bad model evaluation.

Here’s the fatal error they made that you must avoid when deploying models??
Anyone who has even wanted to buy or sell a home before knows how arduous of a process it is.

It’s a difficult process with tons of back-and-forth, and usually takes months.

So what if someone buy from impatient sellers and sell to impatient buyers?

Enter, Zillow:
Zillow is really good at pricing homes.

I mean?really?good.

Their Zestimate score reportedly has an average accuracy of 96%, and closer to 99% on homes up for sale.

With all this data available to them, they could carefully back-test through all sorts of market conditions.
However, they didn’t just thrust themselves into the market.

Over the course of ~3 years, they simulated their strategy.

Inspired by successful simulations, they began to purchase tens of thousands of homes.

If their simulation was so successful, though, how’d they fail?
The first part of their failure was a massive information disadvantage.

I know what you’re thinking:

“But Mark, you just said they have a huge information advantage and a super accurate price estimate for homes!”
Sure, on average, they’re going to be very accurate.

But this is the problem with average accuracy metrics — they mask big errors.

It’s inevitable that even the Zestimate score with up to 99% will miss big on some homes.
How does this happen in the housing market?

Well, the home owner and their real estate agent inevitably have more information on the home than Zillow.

What happens, for instance, if the house has a strong odor or big plumbing issues?

In the long run, this hurts Zillow a lot.
The second part of their disadvantage was an adversarial market.

Remember how I mentioned average accuracy metrics don’t capture the big misses?

Well, the big misses likely come in situations when the homeowner has a key piece of info that Zillow is missing.
So, if Zillow put in a bid that wasn’t high enough, the homeowner would reject it.

But if Zillow put in a bid that was way too high, the homeowner would definitely accept it.

Basically, Zillow was getting the worst case scenario on almost all of their purchases.
Finally the death blow — all of their simulations took place during a market where housing prices were significantly rising.

This meant that if they screwed up a bid, they were probably still going to survive since their portfolio was constantly growing in value.
However, once the market cooled off, they were exposed.

A successful house-flipping operation can still succeed in a cool market.

But in Zillow’s case, it just uncovered the deficiencies that were otherwise masked by a rising market.
This is why model evaluation is so difficult, and yet so incredibly important to get right.

As a field, we’re still in the early phases of understanding how to account for adversarial conditions.

I hope this thread drives home just how important they are to consider!
I hope you learned something!

Follow me @marktenenholtz for more high-signal ML content.

Let’s build more robust ML models together.

Okay folks, we need to talk.

If you think the problem here is that they needed a "human" factor, or that they did poor regression analysis, you're wrong.

The problem is their?decision analysis.?

THIS is what failed in backtesting, NOT the regression.

要查看或添加评论,请登录

Mike Rea的更多文章

  • Core Values: Skepticism and Unbiased Experimentation

    Core Values: Skepticism and Unbiased Experimentation

    The difference embedded in PureTech I went back to my interview with Daphne Zohar, the founder and CEO of PureTech…

    3 条评论
  • What if our answer is 5 slides, not 200?

    What if our answer is 5 slides, not 200?

    One thing I realized, watching back my interview on the Cures and Capital podcast was how simple some of the solutions…

    2 条评论
  • Which if am I thenning?

    Which if am I thenning?

    Better 'thens' require better 'ifs' If… then… We then ifs every day - if I take route A vs route B, then it will take…

    2 条评论
  • Rules for Pharma Revolutionaries

    Rules for Pharma Revolutionaries

    I always liked this video, but, judging by view count, I was in a small group… I do still believe in the need for…

    10 条评论
  • When is the right time to have a high Index of Suspicion?

    When is the right time to have a high Index of Suspicion?

    Independent, interdependent and in phase I..

    5 条评论
  • The large and the agile

    The large and the agile

    Explaining Lilly's current success..

  • Pharma's worst bet

    Pharma's worst bet

    The bet on pharma's traditional development model rarely pays off - time for a rethink Our industry is a strange one:…

    1 条评论
  • The Computer Science of Human Decisions

    The Computer Science of Human Decisions

    What can we learn from Algorithms to Live By? I’ve borrowed these 7 AI-generated lessons from Algorithms to Live By:…

    2 条评论
  • The best worst idea

    The best worst idea

    Premise, promise, proof It’s an alliterative phrase, that forms the basis for one of the worst positioning templates…

    3 条评论
  • The path to 100% Using probabilities the right way

    The path to 100% Using probabilities the right way

    A few years ago, I wrote the following article, Probabilities that aren't In it, I discussed the corrupting effect of…

    2 条评论

社区洞察

其他会员也浏览了