登录查看更多内容

Lyft: Summer 2019

Alex Liebscher

Research Scientist | BetterUp

发布日期: 2019年10月21日

At the beginning of summer, I decided to become a driver for Lyft. The most asked question I get is, “Do you have any crazy stories?” Hopefully, like me, you think data collection and colorful graphs are pretty crazy interesting, otherwise my answer might let you down.

I’d like to share some of my findings as a driver. From the beginning of my interest in driving, I’ve been curious about the raw realities of the occupation. I realized early on that working in the gig-economy was ultimately an optimization problem. This applies mostly to the employers, who are constantly optimizing everything about their business, but also to the workers. I wanted to explore this optimization problem and see how I could make the most out of the job. As I got into driving, I needed to learn how to be an efficient driver (i.e. make more money in less time). Work smarter, not harder, as some say.

I’m restricting this post to an exploration of the data I've collected; some modeling that I've been working on will come next. Naturally, I thought of a few key questions that I figured would help make me a more efficient, successful driver if I could answer them:

When should I drive?
Where are good places to drive?
What qualities of a ride make it more enjoyable or profitable?

In this article, I'll explore each of these questions and how they apply the overarching optimization problem. The goal is to have results that can help make data-driven decisions. Of course, the results presented here will have little generalizability since the data represent only one driver in a small part of San Diego in southern California over a tiny timespan of a few months. If you're looking for more comprehensive data or results, I recommend getting hired by Lyft or Uber and becoming entrusted with analyzing their internal datasets.

Data Collection

A quantitative analysis of being a driver would be nothing without data; here’s a quick explanation of how I got the numbers.

A screenshot of the Google Spreadsheet used to record driving information

While out driving, I had open three apps: Spotify (I always take pride in my music selection), Lyft Driver, and Google Spreadsheets. In Google Spreadsheets, before and after each ride I recorded my odometer reading, my gas tank level (in gallons), and the time. After each driving Session, I would go back and fill in the other details in a different spreadsheet; this included: earnings, tips, whether each drive was Shared, and some other variables. Above is a small clip of a few variables and a few rides. Green rows are Drives and yellow rows are trips back home. I'm currently discovering new ways to make this process more automated and efficient, and will hopefully have a more seamless workflow to share soon.

I performed all of my analysis in R, making good use of a couple popular packages like boot and tidyverse. I prefer theme_minimal() when using ggplot, in case you were wondering. All confidence intervals presented were performed via a simple percentile procedure.

All in all, I’ve been a Lyft driver for 104 days and have given 200 rides for 322 people over 2,140 miles and 90 hours. That’s about 13 rides per week, or about 6 hours per week. That’s pretty part-time, but that’s the joy in the gig-economy jobs! Each ride, on average, makes about $10. Not including gas or any other costs, I have made about $23.13 per hour.

Fair Payment

Lyft pays drivers via a per minute rate and a per mile rate. For example, if I drove a passenger 10 miles in 20 minutes (30 mph), I'd earn 10 times the mile rate plus 20 times the minute rate. I'm hesitant to publish these rates – I know they're not accessible without logging in as a driver. I wish to respect that and thus I'll just say that my analysis positively confirms that Lyft is paying what they tell drivers, down to the tenth of a cent. To check this, I ran a robust linear regression on Earnings over two fixed effects: Duration and Distance. The coefficients of the model sure enough match up with what Lyft claims; great! I preferred a robust regression since a handful of rides incorporated bonuses (because e.g. heavy traffic, surge pricing, etc.) which slightly skewed the coefficients of a standard regression.

When should I drive?

Of all the variables associated with rideshare driving, there are only two which really matter that the driver can control: the day and time that they drive. In this section, I'll share some results regarding the days and times I've found best to give rides.

Before jumping into the analysis, let me first define a few key variables.

First, a Drive is a single period which begins with me receiving a request to pick up a passenger and ends with me dropping off that passenger. This includes the time I spend to get to the passenger to pick them up. Simply put though, I get paid according to the Duration and Distance for which the passenger is in my car. The passenger can leave a Tip in-app – I keep 100% of Tip money. There are also the Search periods in-between Drives. For these, I record Durations, Distances traveled, and Gas Usage as well. I don't get any money for Search periods. Sessions are each block of time that I have my Lyft app on and accepting rides. On average, I make about 5 Drives per Session.

Now, there are many ways to measure how good a ride was; since I am driving to make money, I am most interested in what I take home at the end of the day. One analyst may look solely at the Earnings plus Tips for each Drive or Session and make conclusions via that measure. Others may transform that number into a wage value (divide Earnings plus Tips by the Duration of the drive). I figured it would be important to create a wage value, not only incorporating the Duration of the drive, but also incorporate the waiting time before and after that ride (if applicable). On top of that, I found it important to also add in gas expenses. Thus, I created a new variable for each ride: Adjusted Wage. This is the sum of Earning and Tips, minus Gas Expenses, all divided by the total duration of the Session in hours. This measure will be referenced from here on out.

To complement this measure of ride success, I recently began rating each ride from 1 to 5 stars over three variables: comfortability, route, and conversation. Hopefully, I will expound upon these other measures in a later post.

Great, now to the numbers.

Graphs of the Adjusted Wage given both ride start time and ride day of the week

On average, do mornings, afternoons, evenings, or nights earn the best wage? There’s no one single answer for this, however there seem to be general trends. An immediate, naive answer may stem from looking at the Gaussian Process (10 knots) Generalized Additive Model above. Later seems to be better, but the numbers are very noisy. At a glance, about 6PM onwards appears to be a decent time to drive. In a later post, I'll be presenting various models which offer more flexibility for fitting the data and more power for predicting interesting results.

Like time, no one answer can justify earnings for each day of the week. However, for each day of the week, there is a significant difference between the mean Adjusted Wages (F(6, 192) = 2.85, p = 0.01). This suggests that earnings correlate with which day one decides to drive in a way that's hard to attribute to just randomness. Unsurprisingly, it appears that the weekend offers good opportunities to drive.

Adjusted Wage given both day of the week and start time in a tile-like graph

Together, day and time provide a helpful proxy for earnings. The plot to the left shows average Adjusted Wages given the day and time each ride takes place. I've tried to sample the space randomly, but that was difficult to maintain so there could be a bit of sampling bias. Nonetheless, we see some patterns of high Adjusted Wage, like Saturday evenings and Sunday mornings.

I was also curious about certain socially constructed time periods commonly associated with certain activities, and how those might correlate with income. For example, is the nightlife scene really as great as the articles online make it seem? To determine some answers, I created a deterministic set of rules to label each drive with one of a few labels: MorningCommute, AfternoonCommute, Evening Commute, Tourism, Nightlife, and Other. These labels are somewhat subjective, but as intuitive as I could imagine. MorningCommute is given to rides in which the passenger is headed to work between 5AM and 10AM on a weekday, Tourism is for rides in which the passengers are headed either from a hotel to food, leisure, or something social, or the reverse, on a Saturday or Sunday. Similar rules apply to the other labels. I've found that there is no significant difference between the Adjusted Wages of the three commute periods (F(2, 49) = 0.59, p = 0.56). I somewhat attribute this result to a lack of data; I've only made 6 AfternoonCommute trips (and 20 EveningCommute and 26 MorningCommute). Moving on though, Nightlife rides appear to significantly more profitable than all other rides (t(28.6) = 2.4, p = 0.02). As we'll see, I tend to have less (unpaid) waiting time between rides during theses Nightlife periods, which may improve the Adjusted Wage.

Wait time before getting a ride given both day of the week and time of the day

Additionally, I've taken a look at how much I wait between rides. These times are uncompensated, so it makes sense to want to avoid them. Qualitatively speaking, evenings seem good, whereas late mornings and afternoons have longer waiting periods. Note though, that neither the day of the week nor the ride time are significant predictors of whether I will have to wait for rides (F(6, 219) = 1.2, p = 0.3 and F(1, 224) = 0.08, p = 0.78). More over, for all Search periods where I wait for at least a minute, day of the week and ride time still do not predict exactly how long a wait period will be (F(6, 173) = 1.7, p = 0.12 and F(1, 178) = 1.4, p = 0.24). With all this being said, I personally enjoy driving later in the week in the early evenings. I find that conversations are better, I have a chance to see my passenger (i.e. make eye contact in the mirror during conversation), and it appears as though I might earn a little more during those periods.

Where are good places to drive?

Along with when a driver drives, they have some degree of control over where they drive. It’s not complete control, since it could very well be impossible for me to drive in New York City, even if I knew that would produce highest earnings. I can decide where in San Diego County I’d like to start driving, however sometimes drives can drag a driver around town with no apparent pattern or control.

Nonetheless, I attempt to explore how the location I am in when I get a new ride correlates with earnings. This process was slightly subjective, and it would be something I would rethink if I were to start over. When I get pinged with a new ride, I mentally take note of the "neighborhood" which I am in (note: not necessarily where the passenger is picked up, but where I get the notification). The neighborhood I take note of varies somewhat on scale and precision. For example, places where I tended to get a lot of rides (e.g. Downtown SD) were broken down into a finer scale (e.g. Gaslamp, East Village, Brokers Hill, etc.) compared to neighborhoods I didn't visit often (e.g. La Mesa, San Marcos).

The top three locations I found rides in were (# of rides in parentheses): the UTC area (24), East Village (12), and Pacific Beach (10). 19 out of 51 unique locations only had one ride originating from them. This sparse data made it difficult to infer any conclusive results, however it's still amusing to look at the data.

Adjusted Wage given ride starting location

Here are the locations I've gotten five or more rides from, with their average Adjusted Wage and 95% CIs. Picking up folks from the SAN Airport may be one's best bet, but this isn't a statistically significant conclusion.

Personally, although picking up folks from the Airport might pay well, I'm never sure where I might have to take people (potentially 30+ minutes away) and the Airport can be stressful to navigate. Thus, I'd prefer to navigate to other places and only take Airport rides when I'm already there and it's easiest.

With the new data collection workflow I'm creating, I record my geographical coordinates when pinged for a ride and when the passenger is dropped off. With these data, more in-depth geospatial analyses will be possible. For example, with enough data I'd be able to interpolate the probability of getting a ride given any location. As it stands now, the Location data I have is mostly for fun and a little general insight, but again, nothing conclusive.

What qualities of a ride make it more enjoyable or profitable?

Besides when and where I drive, I’ve been curious about other potential influences on earnings. The only other things I think a driver would have control over are their own behavior (both interpersonal and driving), amenities they offer, state of their car, and maybe music choice. These are difficult to record, but below I try to pick apart one of these and some other ride variables.

Previous articles on what might improve Tips consistently mention having a meaningful conversation with the passenger(s). This makes intuitive sense – drivers are customer service workers and one might say that part of our job is to make the customer feel comfortable and engaged.

Tip amounts for Shared rides and rides with Conversation

However, to my surprise, tipping doesn’t seem as universal as I would have guessed. Only about 30% of my passengers tip. Those that do tip, tip an average of $3.20. In the plot to the left, we see how Tipping varies with Shared rides and Conversation. I speculated that Shared rides would have different tips than regular Lyfts.

For example, the people who chose Shared rides may also be, perhaps, more introverted or more frugal, thus leading to less tipping. In my experience, those that Tip tend to be outwardly friendly. I always try to match the mood of the passenger and don't force conversation. With that said though, I try to test out conversation at the beginning of the ride to see if it's appropriate. To answer these three questions about Conversation and Shared rides, I first subset the data to rides with Tips greater than $0 (what's plotted here). Ideally, this rules out personality quirks (i.e. introversion) since I assume that the passenger intended on Tipping, it was only a matter of how much. I ran a Generalized Linear Model, regressing Shared, Conversation, and their interaction over the Gamma distributed Tips variable. Additionally, I include Distance, Duration, number of Passengers, Goal, and Origin as controlling covariates. These are all things which I suspect may impact the Tip amount. Neither Conversation, Shared, nor their Interaction significantly explain how much these passengers tip (Chi-Sq(1) = 0.87, p = 0.35; Chi-Sq(1) = 0.008, p = 0.93; Chi-Sq(1) = 0.003, p = 0.95)

These last results don't shed too much light on the original questions. However, does Conversation explain anything about a passenger's decision to Tip? Using the entire dataset, the addition of Conversation to a logistic regression using the same covariates as before actually does significantly predict whether a passenger Tips (Chi-Sq(1) = 18.6, p < 0.001). When I have a Conversation, the odds of the passenger tipping are strongly increased. Do passengers who opt for a Shared ride Tip with different probability? In fact, Shared rides are significantly less likely to Tip (Chi-Sq(1) = 6.2, p = 0.01). There is no significance in the interaction between the two predictors (Chi-Sq(1) = 0.02, p = 0.9).

Based upon these results, I can say that having a meaningful Conversation with my passengers is probably going to improve my odds of getting a Tip, although this wouldn't be as pronounced if it's a Shared ride. However, this may easily be explained by a correlation between passengers' extroversion and their generosity, especially knowing that Conversation doesn't elicit larger tips among those who were assumed to already Tip for their ride.

Another question I’m frequently asked is, “Do longer drives make more than shorter ones?” We know we have two pay rates, one for distance ($/mile), which we call L, and another for duration ($/hour), which we call H. The total T a driver makes for a trip, not including Tips and bonuses is T = d*L + t*H, where d is the distance traveled (in miles) and t is the duration (in hours). Therefore, the wage a driver earns is then W = T / t = d*L / t + t*H / t = d*L / t + H. We can see that d / t is simply the trip's average speed (distance in miles per duration in hours). Therefore, we can say that a driver's wage W is a function of their speed s, W = s*L + H. To get at our original question then, a driver's wage doesn't technically rely on distance. Thus, it's a bit of a uninformative question. To maximize the wage, we'd maximize speed. One might criticize the Lyft platform for this because this inherently, and subconsciously, encourages fast driving. What then follows is speeding and unsafe driving behavior.

However, one can note that speed is somewhat dependent on distance. Farther distances are probably going to make use of freeways, which allow for higher speeds than city streets. If we wanted to consider this, we could represent the relationship between distance D and speed S as: S = a + b*log(D) + e, where a is an intercept, b is a coefficient, and e is random noise. Fitting this model to my data, I get real values for a and b. If we plug S here in the equation for W up above, we get the following function:

A function of Wage given distance of ride

So to finally answer this question: longer rides are better, but this is mostly a result of being able to go faster. We see logarithmic increases in Wage for rides up to about 10 or 15 miles, and then the function begins to flatten out.

Finally, after giving only a dozen rides or so, I had a feeling that the first and last rides of a driving Session probably earned less than the rides that come between. I hypothesized this since there was a pattern of driving out of my residential area to more urban areas, or driving home after dropping off a passenger far away. Moreover, it might be that the Last ride earns less on average since I have a tendency to end my driving session if I have a poor ride (e.g. I'm uncomfortable, losing patience, etc.). Thus, I label each ride as either First, Middle, or Last, based on its position in the drive session. As it turns out, there is a significant difference between the means of Adjusted Wages for each position (F(2, 196) = 21.2, p < 0.001).

Lyft has an option for drivers that allows us to only receive ride requests that start and end near a route we choose. For example, if I'd like to have a ride on my way home, I'd turn on this option and hope that I'd be able to find a passenger on my way home. Unfortunately, this feature rarely sends me a ride (the criteria for how close a ride must stick to your route must be stringent). Instead, I usually end up driving all the way home while it searches for a ride.

Conclusion

In this post, I've tried to summarize some of my findings as a Lyft driver. In the first section, despite much noise, it's suggested that driving later in the week and in the evenings seems most profitable. In the section after, sparse and somewhat uninformative location data provide interesting visualizations which might later be used to formulate hypotheses. Yet, some locations do appear to offer more earning opportunity than others. In the final section, I looked at how people Tip, and noticed that having good conversations with passengers is positively associated with tipping; saw that longer rides earn more, but mostly because one can drive faster; and the negative effect on earnings of being the first ride or the last ride of a Session.

Last bits of numbers to round out the article: my Lyft wage taking into account only Earnings and Tips and only the duration of the ride (time that I'm actually driving people): $36.28 (95% CI: $34.60 — $38.11). The same measure but without tips: $32.30 (95% CI: $30.92 — $33.85). My wage for Earnings plus Tips for the total Duration, including wait times: $23.13 (95% CI: $21.68 — $24.57). This last measure and including gas expenses: $19.16 (95% CI: $17.86 — $20.52). This last measure and including maintenance expenses and vehicle depreciation (about $1,000 per 5,000 miles driven beyond the current condition of my car): $14.72 (95% CI: $13.42 — $16.04). So, all things considered, Lyft earns me about $13.50 to $16.00 an hour. Definitely above minimum wage, and the CIs include the living wage for a single adult described for my area ($15.61/hr). However, I personally can't see myself driving full-time, let alone supporting myself on only this job. An important note though is that, by definition of who I am (male, young, etc.), I might be making more than other drivers.

I plan to continue driving during my next year at school, although potentially less. At some point hopefully I'll write the sequel to this article and discuss how I've modeled some of the variables mentioned here and their influence on earnings.

My passengers have covered a wide variety of occupations and have had the most fascinating lives. I've received insightful life advice, and when requested, given out my own. A handful of eye-opening conversations have forced me to reconsider where I'm headed after I graduate. I'll let my LinkedIn community know of any big changes that happen in the future.

Lastly, I'd like to say how grateful I am that I have a clean, reliable vehicle and the opportunity to drive. My parents graciously helped me buy this car, and they currently handle my insurance. I know not everyone is in these circumstances. If you're thinking about driving, feel free to reach out to me and I'd be happy to answer any questions. If you're interested in talking about the gig-economy, my analysis, or anything else data related, leave a comment.