Google Data Analytics - Case Study 1: Cyclistic, a bike-sharing company.
Matheus Siedler

Google Data Analytics - Case Study 1: Cyclistic, a bike-sharing company.

The case study presented is part of the Google Analytics Professional Certification program. It aims to showcase my analytical approach and proficiency in utilizing targeted tools to tackle business inquiries. In real life, I would make different decisions, especially with guidance from more experienced managers, colleagues, and stakeholders.


About the challenge

As a junior data analyst at Cyclistic, a bike-sharing company in Chicago, I am currently working with the marketing team to analyze the differences in bicycle usage between two groups - casual cyclists and annual members. Our goal is to develop a marketing strategy that can help convert casual users into annual members. To achieve this, we need to gain compelling data insights and create professional data visualizations that can support our recommendations. Ultimately, we will present our findings to the company executives.

Characters

  • Matheus Siedler is a Junior Data Analyst.
  • Cyclistic is a bike-sharing program with over 5,800 bikes and 600 stations, offering a variety of bicycles including recumbent models, manual tricycles, and cargo bikes, making it more inclusive for people with disabilities and cyclists who can't use standard two-wheel bikes, with about 8% of users opting for assistive bikes, and approximately 30% using the bikes for daily commuting.
  • Lily Moreno serves as the marketing director, responsible for developing campaigns and promotional initiatives for the bike-sharing program through various channels such as email and social media.
  • The Cyclistic marketing analysis team comprises data analysts who gather, analyze, and report data to guide marketing strategy. You joined this team six months ago, learning about Cyclistic's mission and business objectives and how, as a junior data analyst, you can contribute to achieving them.
  • The Cyclistic executive team, known for their attention to detail, will decide whether to approve the recommended marketing program.

Main Questions - ASK

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

My Task

?? Analyze data trends to discover differences in Cyclistic bike usage between annual members and casual riders.

Data

The dataset utilized is sourced from the years 2023 to 2024 and is publicly available from Motivate International Inc., accessible through the 谷歌 Data Analytics Certificate.

It comprises a total aggregated dataset with 13 columns and 6,389,601 rows, containing attributes such as rider identification number, bike type, starting and ending latitude and longitude, rider status (casual or member), and start and end station identification.

ROCCC Test Evaluation:

  • Reliable: Affirmative, as the dataset originates from the City of Chicago's Divvy program, managed by Lyft Bikes and Scooters, LLC, a subsidiary of a publicly traded company.
  • Original: Absolutely, all data is firsthand and collected directly.
  • Comprehensive: Indeed, it offers sufficient depth to address the pertinent business inquiry.
  • Current: Yes, the data is from 2023 to 2024, ensuring its contemporaneity and relevance.
  • Cited: the dataset is from a reputable first-party source, the Divvy program, which oversees the bikeshare program in Chicago, ensuring credibility.


ETL (Extract, Transform, and Load) - PREPARE and PROCESS

Data download from 2023 to 2024.

  • Firstly, the data was stored in a local folder within my project directory.

Local folder containing the data
Data files

Pentaho Spoon ETL for ingesting data into a PostgreSQL database.

??Why did I use Pentaho Data Integration?

  • Within my set of tools and together with the skills learned in the Google program (Google Big Query, R Language, SQL, Excel, etc.), I chose Pentaho because it is an incredible tool that is capable of reading, processing, manipulating, transforming and load data in a very optimized way.

  • Data transformation and formatting.
  • Creation of calculated columns with Spoon's internal logic and JavaScript.

Pentaho Transformation Pipeline

  • New columns created with this ETL above:

New columns

  • The "s_" prefix stands for "start" because all the columns that have it are derived from the "started_at" column.
  • Date calculations:

Calculator Step

  • Calculations in JavaScript:

Modified JavaScript Value Step

  • JS Code:

// The Haversine formula

var start_lat_in_rad = start_lat * Math.PI / 180;
var start_lng_in_rad = start_lng * Math.PI / 180;
var end_lat_in_rad = end_lat * Math.PI / 180;
var end_lng_in_rad = end_lng * Math.PI / 180;

var lat_delta = end_lat_in_rad - start_lat_in_rad;
var lng_delta = end_lng_in_rad - start_lng_in_rad;

var a = Math.pow(Math.sin(lat_delta / 2), 2) + Math.cos(start_lat_in_rad) * Math.cos(end_lat_in_rad) * Math.pow(Math.sin(lng_delta / 2), 2);
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));

var R = 6371; // Radius of the Earth in kilometers
var ride_length_in_km = R * c; // Distance between the two geographic points

// ---------------------------------------------------------------------------------

if (s_hour >= 0 && s_hour < 12) {
	var s_time_of_day = "Morning";
} else if (s_hour >= 12 && s_hour < 18) {
	var s_time_of_day = "Afternoon";
} else if (s_hour >= 18 && s_hour < 22) {
	var s_time_of_day = "Evening";
} else {
	var s_time_of_day = "Late Night";
}

// ---------------------------------------------------------------------------------

if ((s_month >= 1 && s_month <= 2) || s_month == 12) {
	var s_season = "Winter";
} else if (s_month >= 3 && s_month <= 5) {
	var s_season = "Spring";
} else if (s_month >= 6 && s_month <= 8) {
	var s_season = "Summer";
} else {
	var s_season = "Autumn";
}

// ---------------------------------------------------------------------------------

if (started_at == null) {
	var s_date = null;
} else {
	var dateString = started_at.toString();
	var s_date = dateString.split(" ")[0];
}

// ---------------------------------------------------------------------------------

var chicagoBoundingBox = {
    north: 42.023,
    south: 41.644,
    east: -87.524,
    west: -87.940
};

function isInChicago(latitude, longitude) {
    return latitude >= chicagoBoundingBox.south &&
           latitude <= chicagoBoundingBox.north &&
           longitude >= chicagoBoundingBox.west &&
           longitude <= chicagoBoundingBox.east;
};

var latitudeStart = start_lat;
var longitudeStart = start_lng;

var latitudeEnd = end_lat;
var longitudeEnd = end_lng;

if (isInChicago(latitudeStart, longitudeStart) && isInChicago(latitudeEnd, longitudeEnd)) {
    var ride_in_chicago = true;
} else {
    var ride_in_chicago = false;
}        

Importing data into Power BI via PostgreSQL Connection.

  • Data processing and simple logic within the tool.
  • Main Filters: pages include filters for Chicago-exclusive rides longer than 0 kilometers. Additionally, visuals, excluding cards, primarily filter out Null station names.
  • Original columns:

Columns in Power BI

  • Calculated Columns from the database:

Columns in Power BI

  • Calculated Columns from Power BI:

Columns in Power BI

  • Measures set:

Power BI Measures

  • Some calculations in DAX (Data Analysis Expressions):

Total Unique Rides in Chicago = DISTINCTCOUNT(Cyclistic[ride_id])

Total Stations = DISTINCTCOUNT(Cyclistic[start_station_id])

Average Ride Length in Minutes = AVERAGEX(Cyclistic, DIVIDE(Cyclistic[ride_length_in_seconds], 60))        

  • Power BI Semantic Model:

Semantic Model

Power BI Report - ANALYZE

Project Design:

  • All the design and stylization of the project to obtain an elegant look that conveys captivating storytelling was done in Figma .

Figma Design File

Power BI Report - Pages and Interactions:

  • Home

Home

  • Temporal Analysis

Temporal Analysis

  • Maps

Maps

  • Reports

Reports

  • Infos

Infos

  • Filters:

Main filters
Filter Panel

  • Access the published report by clicking on the link below - SHARE:

Analysis and Insights

  1. How do annual members and casual riders use Cyclistic bikes differently?

  • The Power BI report has several features and possibilities for generating analysis and in order not to extend this article, let's go deeper and focus on the 2023 data.

?? Global View:

Home - Global View
Temporal Analysis - Global View
Maps - Global View

??Member View:

Home - Member View
Temporal Analysis - Member View
Maps - Member View

??Casual View:

Home - Casual View
Temporal Analysis - Casual View
Maps - Casual View

??Insights:

??Total Rides:

  • Global: 5.38 M rides.
  • Member: 3.5 M rides - 65.09%.
  • Casual: 1.88 M rides - 34.91%.

??Bike Type:

  • Global: Electric Bike (52.15%) > Classic Bike (46.72%) > Docked Bike (1.14%).
  • Member: Electric Bike (50.53%) > Classic Bike (49.47%).
  • Casual: Electric Bike (55.17%) > Classic Bike (41.58%) > Docked Bike (3.26%).

??Average Ride Length:

  • Global: Average Ride Length is 14.7 minutes | Average Ride Length is 2.2 Km.
  • Member: Average Ride Length is 12.01 minutes | Average Ride Length is 2.16 Km.
  • Casual: Average Ride Length is 19.71 minutes | Average Ride Length is 2.27 Km.

??TOP Stations and Rides - From the top 10 I will highlight the first 5:

  • Global:

Ranks

  • Member:

Ranks

  • Casual:

Ranks

??Distribution of rides by time:

  • Member: Morning and afternoon are busiest with peaks at 8 am and 5 pm. The night is also busy here.
  • The period with the most rides is the afternoon.

Distribution of rides by time

  • Casual: Afternoon and night are busier with prices above average between 12 pm and 7 pm. The morning period has a lot of rides but in a longer and more distributed interval.
  • The period with the most rides is the afternoon.

Distribution of rides by time

??Distribution of rides by days:

  • Member: The whole week here is very busy, but the highlight is the middle of the week (Tuesday, Wednesday, Thursday) with the highest volumes of rides.

Distribution of rides by days

  • Casual: Here the whole week has a good movement, but the highlight is the weekend (Saturday and Sunday) with the highest volumes of rides.

Distribution of rides by days

??Distribution of rides by months and seasons:

  • Member and Casual: What differs here is the number of rides, because in terms of volume and trends in monthly distribution, both types of customers have very similar patterns.

Distribution of rides by months

  • With this in mind, the distribution of travel by season is closely linked to the monthly distribution, with Summer being the biggest season in terms of ride volume.

Distribution of rides by seasons

??Distribution of Average Ride Length:

  • Member and Casual: In general, casual customers' rides are longer in time and distance. The weekend and the months near or in the middle of the year represent the peaks of Average Ride Length.

Distribution of Average Ride Length by days
Distribution of Average Ride Length by months

  1. Why would casual riders buy Cyclistic annual memberships?

?? During the weekdays, member riders take more rides compared to weekends when casual riders take more rides.

  • According to Cyclistic, as their users tend to use bicycles for leisure purposes, casual customers are an ideal fit for their target audience.
  • The company can focus on satisfying this group's needs and preferences, which can lead to higher satisfaction levels and a greater likelihood of subscribing to the annual plan.
  • This can be achieved by providing specific benefits and treatments tailored to the leisure cycling community.

?? Another important point is that around 30% of users use bicycles to commute to work daily, according to the company.

  • With this in mind, in the comparisons of travel distribution by time and period, we have similar patterns with large movements in the beginning (8 am to 9 am) and end (5 pm to 6 pm) intervals of the working day.
  • Therefore, another audience that could benefit from our services is weekday commuters who use bicycles to get to and from work.
  • So, casual customers, who have a very good distribution of trips during the weekdays, may receive offers, personalized services, and differentiated treatments to increase satisfaction, making them more likely to subscribe to the annual plan.


Decision-making - ACT

  1. How can Cyclistic use digital media to influence casual riders to become members?

Among the various insights that could be generated, taking the two above as a reference, we could use digital media in the following way:

  • It would be useful to conduct research, such as surveys, polls, and quizzes, to gain a better understanding of why our customers signed up for the annual plan. By doing so, we can better understand this group and make it easier to convince more casual customers to become members.
  • Restructure and strengthen Cyclistic's digital brand presence across various platforms such as Google, Instagram, Facebook, Twitter, and YouTube.
  • Create eye-catching and personalized digital marketing campaigns targeting casual customers to efficiently convert them.
  • To make more precise and valuable data-driven decisions, it is recommended to delve further into analysis and insights.

?? These steps will help the company determine if it can convert casual riders into members.


#Google #PowerBi #Pentaho #DataAnalytics #GoogleDataAnalyticsCertificate #JuniorDataAnalytics #DataDrivenDecisions #BikeShare #CapstoneProject #Cyclistic #Coursera #DataVisualization

要查看或添加评论,请登录

社区洞察

其他会员也浏览了