登录查看更多内容

Episode #119: Developing the holy "grail" model at Lyft, user journeys, and hidden analytics with Sean Taylor

Al Chen

Solutions Architect at Coda

发布日期: 2023年9月18日

This post originally appeared on the?KeyCuts blog .

Future Dear Analyst episodes will get more sporadic since, well, life gets in the way. Unfortunately curiosity (in most cases) doesn't pay the bills. Nevertheless, when I come across an idea or person that I think is worth sharing/learning more about, I'll try my best to post. In this episode, I interview the Chief Scientist of a data startup who did his PhD at Stern NYU and was on track go becoming a professor. Then he got an internship at Facebook and everything changed. The speed of learning at a tech company outpaced what the academic was used to at university. Over the years, Sean Taylor has worked with and spoken to hundreds of data analysts and statisticians. We'll dive into his data science work at Lyft, his notion of "hidden analytics," and why he's obsessed with user journeys in modern applications.

Modeling the Lyft marketplace and creating the GRAIL model

Sean worked at Facebook for 5 years as a research scientist and worked on general data problems. Eventually he joined the revenue operations science team at Lyft. His team's goal was to help grow the marketplace of riders and drives on the platform. One of the most important aspects of the marketplace is the forecast. As Lyft runs promotions and enters new cities, how do you ensure there are enough drivers for the riders and vice versa?

The team ultimately decided that a simple cohort methodology would be best to help set the forecast for both drivers and riders. Every rider, for instance, would belong to a cohort based on when they first signed up for Lyft, when they booked their first ride, etc. There's a "liquidation curve" for each cohort that eventually hugs the x-axis. There is much more detail about the cohort methodology in this blog post by the Lyft Engineering team from 2019.

Despite being such a simple model, the model worked surprisingly well. Goals of this model taken from the blog post mentioned in the previous paragraph:

Forecast the behavior of each observed cohort and use it to project how many rides are taken or driver hours are provided within a specific cohort
Forecast the behavior of the cohorts that are yet to be seen.
Aggregate all the projected rides and driver hours to make forecasts for both the demand and supply side of our business.

Sean talked about how there were flaws in the model, and one of those flaws is that a marketplace is ver fluid and evolves over time. When a rider is exposed ot high prices, this may lead to churn and this was also not included in the model. Sean's team tried building a better model called GRAIL but Sean left Lyft before completing the model.

Speaking of Lyft's data team, I had mentioned Amundsen , an open source data discovery platform Lyft released in 2019 (blog post ). It's great to see the data team at Lyft giving back to the ecosystem to help data analysts and data scientists do their job better!

Discovering a bug that cost the company $15M per year

One of the best feelings as a data analyst is using data to uncover the root cause or underlying trends in a given business situation. One might say this is like Moneyball where the Oakland As realize that On-base percentage (OBP) is the best predictor for player performance.

Sean believes there is a lot that data analysts do that is not necessarily taught in school or on the job. You're expected to understand the business and how every day business operations are translated into the numbers on the dashboard.

When you're working on a project because your are curious about the project rather than being forced to come up with an analysis, you are able to come up with the bigger wins that really move the needle. Sean calls this type of work "hidden analytics," or as I like to say, there is much more behind the numbers.

Sean's colleague at Lyft cam across some anomaly in the data and just started pulling on the thread some more. His colleague ultimately found a bug in the marketplace in how Lyft was dispersing driver incentives. Sean talks about how his colleague's curiosity led them to discover this bug in the first place and squashing the bug led to saving Lyft $15M per year.

Bernard Marr 8 年前

How To Make A Billion Dollars From Big Data

Bernard Marr 9 年前

Happy Thanksgiving! OpenAI, G.O.A.T. CEO, Recession…

Abhijeet Khadilkar 1 年前

Why the systems for collecting user journey data are broken

Modern websites and applications collect a ton of data, but the actual user journey is harder to quantify. A customer signs up for a tool or service, goes through an onboarding process, and might engage with the tool at various times in the future. Modeling and visualizing this data on a spreadsheet or in a SQL database can be difficult. With these tools, you are aggregating data and parts of the user journey might be improperly reduced down to a single number when there is much more nuance to a user's journey on a website.

Users are in different states when using a website or app. Sessionizing data has become the default way to capture the path a user takes but there are still many micro-sessions in just one experience like registering your account on a website.

Sean discusses this concept in the context of a rider taking or not taking a ride booked on Lyft. The customer requests the ride, and perhaps declines the first ride and books the second ride. The basic conversion rate would be 50%, but that statistic doesn't answer why the customer didn't book the first ride. Perhaps the customer couldn't find the right address with the first ride, and just gave up. Perhaps the driver was too far away.

Balancing usability and expressivity in data tools

Browse any Hacker News article and you'll inevitably see devs talking about why you should just build your own tool on-prem with code. The main reason is that you can fully customize the app if you know how to code. I've discussed at length on this podcast and through content I've created for my company how the need for low-code and no-code tools redefines who a "builder" is in a company.

Sean's current company (Motif Analytics) is trying to strike that balance between giving data analysts and data scientists the ability to express their data question without diving right into the code. In terms of user journey data, Sean says most people use Amplitude, Mixpanel, or other similar tools. While these tools allow you to execute common data tasks, there are certain things these tools block you from doing. Python notebooks, for instance, are very expressive. But you kind of need to be an expert to use them to their full potential.

Sean talks about how he drew inspiration from Ruby on Rails in terms of how the creators had strong opinions about how to do web development. I also first learned about web development through a Ruby on Rails book and it's interesting to see how many of the patterns from Rails are still seen in frameworks using PHP or Javascript.

As we discussed the platform Sean and his team are building, we got into the weeds about a little-known SQL command called MATCH_RECOGNIZE() . There apparently isn't much documentation about this function and the creators behind SQL rushed this pattern-matching function into the language because of competitors coming out with similar functionality. Nothing like real-world drama impacting the open source world!

Start with the questions instead of the tools

We ended the conversation with a bit of career talk. Sean talks about intrinsic motivation being the number one driving force in his career. While tools come and go, he said domain expertise is something that can give budding analysts a leg up when searching for their next role. Technical skills, unfortunately, are slowly becoming a commodity. What never goes out of style? Asking the right questions .

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

Dear Analyst

16,809 位关注者

Shankar Somayajula

Architect - Advanced Analytics at Oracle

1 年

Al Chen Thanks for the episode and the interview with Sean Taylor. Very informative and interesting... I'm a big fan of Motif Analytics. Fyi, the "obscure" functionality referred to in SQL is not MATCH() as in above article/writeup but MATCH_RECOGNIZE() ... podcast got it right... which is part of the SQL:2016 standard. Game changing but niche functionality. Oracle has had it for years (almost a decade considering on-prem) and Snowflake has included recently (couple of years). BigQuery doesn't have it. The rushed and hardly used commentary probably applies to recent adoptees.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Episode #119: Developing the holy "grail" model at Lyft, user journeys, and hidden analytics with Sean Taylor

Al Chen

Solutions Architect at Coda

Modeling the Lyft marketplace and creating the GRAIL model

Discovering a bug that cost the company $15M per year

领英推荐

Why the systems for collecting user journey data are broken

Balancing usability and expressivity in data tools

Start with the questions instead of the tools

Other Podcasts & Blog Posts

Dear Analyst

16,809 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Reflecting on 2022 with Brian Hills, CEO at The Data Lab

USF Hackathon: Making a Business Hypothesis Actionable with Prescriptive Outcomes

No more Cancellations? No more 'Jana Kahan Hai?'

April Newsletter

The Danger Zone in Data Science

Data Science in Riding Hailing platforms like Uber, Lyft, Rapido, Ola, etc. | Identifying the potential of Data Science reinvent transportation

Ridecell Spotlight - Lukish Yadav - Senior Data Scientist

Is your data ready for AI?

Ask a Data Mentor - Zach Wilson - Founder of EcZachly Inc.

Data Consumers Must Be Mechanics & Pilots: 5 Takeaways from the Guide

Modeling the Lyft marketplace and creating the GRAIL model

Discovering a bug that cost the company $15M per year

领英推荐

Why the systems for collecting user journey data are broken

Balancing usability and expressivity in data tools

Start with the questions instead of the tools

Other Podcasts & Blog Posts

Dear Analyst

16,809 位关注者

Episode #132: How the semantic layer translates your physical data into user-centric business data with Frances O'Rafferty

2024年9月10日

Episode #131: Key insights and best practices from writing SQL for 15+ years with Ergest Xheblati

2024年8月5日

Episode #130: What happens when we rely too much on Excel spreadsheets and shadow IT takes over?

2024年6月24日

Episode #129: How to scale self-serve analytics tools to thousands of users at Datadog with Jean-Mathieu Saponaro

2024年6月18日

Episode #128: What is citizen development and how to build solutions with spreadsheets?

2024年5月28日

Episode #127: Spreadsheets vs. Jira: Which one is better for your team?

2024年5月13日

Episode #126: How to data storytelling and create amazing data visualizations with Amanda Makulec

2024年4月15日

Episode #125: How to identify Taylor Swift's most underrated songs using data with Andrew Firriolo

2024年3月25日

Episode #124: Navigating people, politics and analytics solutions at large companies with Alex Kolokolov

2024年2月5日

Episode #123: Telling data stories about rugby and the NBA with Ben Wylie

2024年1月15日

社区洞察

其他会员也浏览了

Reflecting on 2022 with Brian Hills, CEO at The Data Lab

USF Hackathon: Making a Business Hypothesis Actionable with Prescriptive Outcomes

No more Cancellations? No more 'Jana Kahan Hai?'

April Newsletter

The Danger Zone in Data Science

Data Science in Riding Hailing platforms like Uber, Lyft, Rapido, Ola, etc. | Identifying the potential of Data Science reinvent transportation

Ridecell Spotlight - Lukish Yadav - Senior Data Scientist

Is your data ready for AI?

Ask a Data Mentor - Zach Wilson - Founder of EcZachly Inc.

Data Consumers Must Be Mechanics & Pilots: 5 Takeaways from the Guide