Bikeshare Analysis - Cyclist (SQL-Excel)

Bikeshare Analysis - Cyclist (SQL-Excel)

Introduction

Welcome to the Cyclistic bike-share analysis case study! In this case study, I will perform many real-world tasks of a junior data analyst. I will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Scenario

As a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, Data analyst team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, I will design a new marketing strategy to convert casual riders into annual members.

Characters and teams

Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

Lily Moreno: The director of marketing and manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.

Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. I joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals

Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Ask

Business task

Annual members are far more profitable than casual riders, according to the company's financial analysts. Cyclistic believes that increasing the number of annual members will be essential to achieving future growth, despite the fact that pricing flexibility aids in increasing client acquisition. Our management thinks there is a very good possibility to convert casual riders into members rather than developing a marketing strategy that targets only new clients. She points out that casual riders already know about the Cyclistic programe and have decided to use it for their mobility requirements.

Three questions will guide the future marketing program:

1. How do annual members and casual riders use Cyclistic bikes differently?

2. Why would casual riders buy Cyclistic annual memberships?

3. How can Cyclistic use digital media to influence casual riders to become members?

2. PREPARE

For this case study, I have used 12 months of Cyclistic's 2022 bike ride data from the Divvy database?here. The data has made available by Motivate International Inc. under?this?license.

IS THE DATA ROCCC?

  • Reliable - YES, Accurate & unbiased
  • Original - YES, Original and public dataset
  • Comprehensive - YES, complete information
  • Current - YES, Regularly updated
  • Cited - YES

How each datasets are organized?

I used excel for initial look and preparations.

Each excel file contains more than 100000 rows and 13 following columns,

  • ride_id
  • rideable_type
  • started_at
  • ended_at
  • start_station_name
  • start_station_id
  • end_station_name
  • end_station_id
  • start_lat
  • start_lng
  • end_lat
  • end_lng
  • member_casual,

I noticed numbers of rows having null values in start_station_name and end_station_name,

Process

Initially I used excel for data cleaning,

With excel I used remove duplicate function, there was no duplicate values in any file.

Then , I added two more columns containing Month and day of the week using timestamp given in started_at column in every excel files.

  • =TEXT(value(started_at) ,"MMMM") - for Finding Month from time stamp
  • =TEXT(value(started_at) ,"DDDD") - for Finding Day from time stamp.

Then, I used sort and filter to sort the data with respect to ride's starting time.

Further I decided to remove two columns of end_station_id and start_station_id.

From this stage I decided to use MSSQL - server for further process and analysis of dataset.

After importing all files to the SQL server I used UNION ALL function to join 12 csv files and created new table called yearly_bikesharedata.

Resulting table has 5783326 rows.

Query for joining all tables

Now, I wanted to find out how many rows have null values in column start_station_name and end_station_name.


No alt text provided for this image

Result showed that, 1324788 rows have null values in these two columns.

Which is approximately 21% of all rows.

For Cleaning purpose, I will now write another query to remove all these rows.

No alt text provided for this image

Resulting table has , 4458538 rows.

Further looking at the data I found that using timestamps of start and end i can add a new columns containing a trip_duration in minutes, which can be useful for analyzing data .

I used CONVERT function for extracting time from given two timestamps and then used CAST function read data type as time.

DATEDIFF function is used to calculate trip_duration

Further, I removed rows from trip_duration column which has value less than 0 and more than 1440(24hours), because such timings are not possible.

No alt text provided for this image

ANALYZE

For, analysis I used table Yearly_Bikeshare_tripduration1.

Now let's figure out some stats comparing casual and member riders:

First, I wrote query to find out number of users who are member and casual.

No alt text provided for this image

Results showed that there are 59.79% OF members and 40.21% casual riders.

No alt text provided for this image


Then, I wrote query to find out average trip duration for member riders and casual riders.

No alt text provided for this image

Result showed 23.99 minutes for casual riders and 12.45 minutes for member riders.

Further i wrote query to find out which bike is most used and which bike is least.

No alt text provided for this image

Result showed that classic is most popular among both casual and member riders following by electric bike, No members have used docked bike in previous year.


since our main target audience for marketing is classic users, I took a look into which top 10 stations are mostly used by casual riders for picking up their rides.


No alt text provided for this image

here is the result

No alt text provided for this image

Further for analysis I decided to find most Busiest months, Day of the week and Hours of the day eventually.

No alt text provided for this image
No alt text provided for this image

Result showed that during summer both casual and members takes the most rides,

Saturday is the most popular day for casual riders, while Tuesday is the least popular. Thursday is the day on which members are most active, while Sunday is the least active.

The least popular month for casual riders is January, while the most popular month is July. August is the most and January is the least popular month for member cyclists.

The majority of riders are casual and members who ride Classic bikes. Members hardly ever use the Docked bikes.


The majority of book journeys for both groups happen in the order of hours 17, 18, and 16. (i.e. 5pm, 6pm, and 4pm).

Recommendation

The marketing campaign should concentrate on the summer months and the hours of 4 pm to 6 pm since we want to try to convert casual riders into member riders. On Saturdays, casual bikers are also more active, so the advertising should concentrate on them. It's possible that offering a summer membership might draw casual riders who would not otherwise have spent money on day passes.

Reminding casual users that they can unlock their bikes from one station and return them to any other station in the system at any time should be a part of the marketing effort in order to persuade them to buy yearly memberships rather than day passes.

Luhar Harshit

Asst. Teacher(G S E B)

1 年

Amazing

回复

The analysis phases are very tidy and easy to understand ! Great job !

回复
Mehak Khurana

Data Scientist @ProcMart

1 年

I love the way you present your work! Good job with the analysis ??

Jonathan Smith

Director of Research | Author of The Road Map to Data Analytics

1 年

That's great, congratulations!

Seema Patel, PhD

Data Analyst | Bioinformatician | Medical Writer | Microbiologist | Molecular Biologist I Thought leader

1 年

Well-crafted project. I see, you using the Google analytics course’s suggested steps.

要查看或添加评论,请登录

Jay Mistry的更多文章

  • Adventure Works: Power Bi

    Adventure Works: Power Bi

    For this project, I visualized the sales and returns data from a bike shop called Adventure Works using Power BI. I…

    4 条评论
  • Global Superstore- Analysis.

    Global Superstore- Analysis.

    Here is my first Power Bi dashboard, I have learnt basic and some advance features of power bi with the help of Deeksha…

    7 条评论

社区洞察

其他会员也浏览了