登录查看更多内容

DoorDash Data Analysis

Stuart Walker

Fraud Prevention Analyst @ M&G PLC | Data Analyst | Data Scientist | Python | SQL | Machine Learning | Data Analytics | Excel | Tableau | Power BI | R

发布日期: 2023年1月8日

In the world we now live in we have a multitude of options to stay at home and get hot delicious food and groceries delivered directly to your door.?These tend to vary country to country and in my location I have the option to order from multiple businesses including but not limited to Just Eat, Uber Eats, Scoffable, Deliveroo and I could go on and on.?I was presented with an opportunity to analyse a dataset using Excel from the American company DoorDash which I found exciting to see what insights I could find.

*Please Note: This project is actually modified from an iFood job interview case study given by the Brazilian equivalent of DoorDash, iFood.?The data is 98% real, but slightly modified for educational purposes.?

Using the dataset I was provided with, I was tasked to:

Sort the data
Filter the data
Understand the data
Find business insights
Propose data-driven opportunities to optimise sales campaigns
Overall, the company wants to improve its marketing & wants to see who has purchased following a marketing campaign.

I would do this by using various Excel features including the following:

New Column + Fill
Data Filtering
Data Sorting
General Aggregations (min, max, sum, count, average etc)
General Formulas
Graph Creation
Pivot Tables
Advanced Formulas (IF, SUMIF, COUNTIF, texts, dates, etc)
VLOOKUP

DATA

As previously mentioned the data comes from an iFood Dataset which can be found here on Kaggle.?The dataset includes customer information for annual income, the total amount spent, acceptance of offer per campaign, customer age, total number of days as a customer, date joined, and lots more from DEC 2014 to NOV 2016.?There is data pertaining to currency within the dataset but it does not confirm which.?Therefore for the purposes of this project I have used USD$.

ANALYSIS

After uploading the data to Excel, the first action I made was to give each of the customer’s present on the dataset a unique id which would show as CustomerID on Column A.

Next I used various aggregate functions such as COUNT, MAX, MIN, AVERAGE, and SUM to gain a brief overview and insight into the data.?I also added a new column, PctIncome which calculates the percentage the customers spent on using the service annually from their yearly income.?After the column was in place I was curious to see who was spending the most from their income using the service.?After sorting this new column, I noticed the amount spent and related percentage for customer 21 was astronomically high compared to all other customer’s within the dataset.?Now this could be down to a data entry error and could be entirely innocent.?However with my experience working in a fraud background I am unable to ignore the fact that this could be fraud related and that customer 21 could be using stolen debit, credit or gift cards to make multiple purchases.?Either way this anomaly certainly needs to be investigated further.

For the next part of my analysis I wanted to check the correlation between how much each customer spent in relation to their annual income, for this I made a scatter plot.?Once I created the scatter plot I added a trendline and the R2 value to the plot.?According to the R2 value generated from the trendline (75%) I am very confident that there is a strong correlation between customer income and total spent.?Also during my analysis I spotted two outliers on the graph (highlighted by the red arrows).?One of the outliers is customer 21 who was mentioned in length above.?The other outlier on the scatter plot shows that the income was $113,371.00 and the amount spent was $274.00.?Using these data points I was able to ascertain that the outlier belonged to customer 1873.?My findings showed that customer 1873 is a relatively new customer (just over two months).?I was also able to confirm that they are 75 years old and the majority of their spend was on sweets.?There is a fairly high probability that customer 1873 only used the service to buy sweets for their grandchildren. The other possibility is that their spend was low due to the short time spent using the service.

I was curious to see what the age demographic is within the dataset as I felt this could create some useful insights.?To do this I created a new column named AgeGroup and wrote a nested IF statement to assign each customer into an age group.?I then created a pie chart to visualise the age distribution which shows that most of the customers fall within the 35-50 age range which was followed fairly closely by the 51-65 age range.?I could see that the least amount of customers was within the 24-35 age range which was surprising as I would have thought this range would have had a lot more if not the most.??

领英推荐

Eliminate Downtime on DoorDash with Voosh.ai

Voosh 9 个月前

The Power of Data Analytics in the F&B Industry:…

Mohammad Anas 6 个月前

Swiggy Data Analysis in Excel

Akshay Kumar 2 年前

After checking the age demographic of the dataset I wanted to then check the respectful income and amount spent by age group. Both of the above bar charts clearly show that the 35-50 age range not only earn the most at $45,756,663.00 but they also spent the most at $459,264.00 followed very closely by the 51-65 age range who earn $40,892,975.00 and the spent $450,634.00.?They are then followed by the 66+ group who earn $18,292,647.00 and spent $218,659.00 and finally trailing behind is the 24-34 age group who earn the least at $8,884,434.00 and also spent the least at $112,339.00.?The bar charts and numbers clearly reinforce my analysis of the above scatter plot and R2 value that there is a strong correlation between income and total spend.

To continue my analysis of money spent I decided to create a histogram to show the amount spent by the customers.?To provide more insight I decided to organise the amount spent by each customer into six separate bins which did provide some good insight.?It showed that over half of the customers spent between $4.00 & $418.5 (the lowest value bin) and 19 customers spent between $2076.5 & 2491.00 (the highest value bin).?Even though over half of the customers spent between $4.00 & $418.5 the average amount spent is higher at $562.76.

After delving deep into the age, income and spend data I then decided to have a look at when customers joined the service.?We can see from the above bar chart that most customer joined in January with 223 and the least amount joined in December with 137.?More people may have joined at the beginning of the year for a few reasons, the first of which is after a very busy Christmas/Holiday period they may have decided to take it easy and decide to use a service which can deliver food directly to your door rather than having to go shopping.?Another reason could be because a lot of people received gift cards over the Christmas/Holiday period and signed up to make use of them.?On the flip side we can see the least amount of sign ups in December with 137 and the number of sign ups starts to decrease from October onwards.?This could be down to the fact that customers are starting to tighten their belts in the run up to the Christmas/Holiday period.?Overall though we can see that the number of sign ups from January to December remained fairly consistent with an average number of sign ups being 183 per month.

One of my main tasks was to see which customers purchased (or said yes) to a specific marketing campaign (specifically campaign 6 but more on that shortly) and that’s where the above bar chart comes in.?From the above bar chart it is clearly obvious that campaign 6 was the most successful at 333 and campaign 2 has the worst uptake with only 30 saying yes.

Now as part of my task I needed to analyse Campaign 6 more closely to capture more insights from that specific campaign.?To do this I decided to create a pivot table to capture the information pertaining to Campaign 6.?The above snippet shows the ages represented (red arrow) and those who said yes (blue arrow).?The total amount spent by each age (yellow arrow) and the average amount spent per age (orange arrow).?The data was able to tell me that customers aged 50 spent the most at $14,558.00 and had the most customers at 16.?However the highest average spend came from customer aged 32 with $2048.00 (there were only 2 customers for this age).

I was also tasked to check the amount of customers who have children and accepted campaign 6.?To do this I made a few changes to my pivot table which now shows how many customers have children and how it effects their spend.?We can see the amount of children represented within the dataset (red arrow), the total amount spent (yellow arrow), the amount of customers with 1, 2 or 0 children (blue arrow) and the average amount spent (orange arrow).?With all the data present we can clearly see that the majority of customer who accepted campaign 6 have no children and they also spent the most.?This could be that campaign 6 is marketed more towards single people or couples with no children.?The other probability is that customers with no children have more disposiable income due to not having children to look after.

MY INSIGHTS

There is a very strong/positive correlation between income and total spent with an R2 value of 75%.
The age of the customers range from 24 to 80.?The most amount of customers fall into the 35-50 range with 958 and the least amount of customers is in the 24-34 range with only 187.
The 35-50 age range has the highest total income with $45,756,663.00 and spent the most at $459,264.00.?The 24-34 age has the lowest combined income at $8,884,434.00 and they also spent the least at $112,339.00.
Over half the customer base has spent between $4.00 & $418.5, however the average spend is higher at $562.76.
The most amount of customers joined in January with 223 and the least amount of customers joined in December with 137.?However the average amount joined per month is 183.
The most successful campaign was campaign 6 with 333 customers saying yes, the least successful campaign was campaign 2 with only 30 customers saying yes.
Customers without children tend to spend more than customers with children.

MY RECOMMENDATIONS

Investigate why there are less customers in the 24-34 age group.
Create a marketing campaign aimed directly at the 24-34 age group with possible incentives to encourage sign ups.?For example, money off first order, loyalty scheme where money off vouchers can be attained etc.
Investigate why parents don’t use the service as much as customers without children.?Then create an incentive or market directly towards the service being child friendly (add more products aimed at children) so more parents use the service.
Campaign 6 was a success so to continue running marketing campaigns similar to that campaign.?Investigate why Campaign 2 was so unsuccessful and make changes to make sure that campaign is not used again.
Possibly release new marketing campaigns when sign ups have been lower throughout the year (spring and winter months).

Thank you for taking the time and reading my project. I welcome any feedback and if you enjoyed this project please feel free to connect as I will be posting more projects very soon.

Ryan LaRocco

Data Analyst | SQL & Tableau Specialist | Transforming Data into Actionable Insights.

3 个月

Nicely done! The pie chart was my favorite!

1 次回应

Elisabeth Membel

Business Data Analyst | Administrative Assistant @ Life Safety Consultants

3 个月

Enjoyed reading this- I think Excel is underestimated; you really showcased its capabilities! Your comments about the outliers potentially pointing to fraud in some cases caught my attention. Looking forward to reading your Fraud Detection Using XGBoost post!

1 次回应

Michael Retta

Data Analyst specializing in | SQL I Tableau I Python | Transforming data into strategic insights for business growth ? Committed to continuous learning

1 年

Wow! Your analysis blows my mind! It's so detailed and very appealing to the eyes. I'm impressed by your 3D visuals. Did you use Python to create those? How easy was writing articles on GitHub compared with LinkedIn? This is something I need to learn. Good job, Stuart!

1 次回应

Arifeen Saeed

building Fahm | MSDS @ UT | Huth Lab

1 年

Love your video at the start! Very well written, and cool graphs, thanks for sharing.

1 次回应

Sarah Mangrum

2 年

Great article, Stuart! I loved the 3-D effect on your visuals and your cover video really catches attention.

1 次回应

查看更多评论

要查看或添加评论，请登录

Stuart Walker的更多文章

Fraud Detection using XGBoost: A Machine Learning Approach

2024年10月25日

Fraud Detection using XGBoost: A Machine Learning Approach

The core of the project lies in handling the significant class imbalance typical in fraud detection datasets and…

10 条评论
R HR Attrition Data Analysis

2023年6月19日

R HR Attrition Data Analysis

Human Resources (HR) attrition, also known as employee turnover, is an enduring concern for organisations across…

18 条评论
Python Engineering/Iron Ore Processing Data Analysis

2023年5月21日

Python Engineering/Iron Ore Processing Data Analysis

Iron ore processing is a crucial aspect of the metallurgical industry, primarily concerned with the extraction and…

21 条评论
NBA 2021/2022 Player Stats Data Analysis

2023年4月9日

NBA 2021/2022 Player Stats Data Analysis

The role of analytics in sports is crucial, as it offers valuable insights that empower teams to make informed…

8 条评论
Healthcare SQL Data Analysis

2023年1月30日

Healthcare SQL Data Analysis

In my previous SQL project I looked at financial data from the World Bank. I was able to analyse the data quickly using…

6 条评论
World Bank SQL Data Analysis

2023年1月22日

World Bank SQL Data Analysis

In the world of data analysis you may get datasets that have a few hundred or maybe even a few thousand rows of data…

14 条评论
Massachusetts School Data Analysis

2023年1月13日

Massachusetts School Data Analysis

Imagine being hired to analyse a wealth of education data for a specific state in USA, well in this Tableau project…

8 条评论

See all articles

DoorDash Data Analysis

Stuart Walker

Fraud Prevention Analyst @ M&G PLC | Data Analyst | Data Scientist | Python | SQL | Machine Learning | Data Analytics | Excel | Tableau | Power BI | R

领英推荐

Stuart Walker的更多文章

社区洞察

其他会员也浏览了

Journey up the Grocery Recommendation Value Spectrum

Swiggy API Datasets | Scrape Swiggy Data API

Grocery Delivery Market Insights | 7NOW Grocery Data Scraping

Crisp Forecasting Reduces Food Waste & Drives Customer Profit: Creating a Positive Double Bottom Line

Fresh Fish, Big Data, and Business Strategy

FreshDirect Grocery Delivery Data Scraping Effectively for Your Grocery Business

DoorDash Analysis with Excel

Marketers’ time to deal with rich data deposits

The Content Supermarket

Holiday shopping guide for analytic foodies.

领英推荐

Stuart Walker的更多文章

Fraud Detection using XGBoost: A Machine Learning Approach

R HR Attrition Data Analysis

Python Engineering/Iron Ore Processing Data Analysis

NBA 2021/2022 Player Stats Data Analysis

Healthcare SQL Data Analysis

World Bank SQL Data Analysis

Massachusetts School Data Analysis

社区洞察

其他会员也浏览了

Journey up the Grocery Recommendation Value Spectrum

Swiggy API Datasets | Scrape Swiggy Data API

Grocery Delivery Market Insights | 7NOW Grocery Data Scraping

Crisp Forecasting Reduces Food Waste & Drives Customer Profit: Creating a Positive Double Bottom Line

Fresh Fish, Big Data, and Business Strategy

FreshDirect Grocery Delivery Data Scraping Effectively for Your Grocery Business

DoorDash Analysis with Excel

Marketers’ time to deal with rich data deposits

The Content Supermarket

Holiday shopping guide for analytic foodies.