Google Data Analytics - Case Study 1: Cyclistic, a bike-sharing company.
The case study presented is part of the Google Analytics Professional Certification program. It aims to showcase my analytical approach and proficiency in utilizing targeted tools to tackle business inquiries. In real life, I would make different decisions, especially with guidance from more experienced managers, colleagues, and stakeholders.
About the challenge
As a junior data analyst at Cyclistic, a bike-sharing company in Chicago, I am currently working with the marketing team to analyze the differences in bicycle usage between two groups - casual cyclists and annual members. Our goal is to develop a marketing strategy that can help convert casual users into annual members. To achieve this, we need to gain compelling data insights and create professional data visualizations that can support our recommendations. Ultimately, we will present our findings to the company executives.
Characters
Main Questions - ASK
My Task
?? Analyze data trends to discover differences in Cyclistic bike usage between annual members and casual riders.
Data
The dataset utilized is sourced from the years 2023 to 2024 and is publicly available from Motivate International Inc., accessible through the 谷歌 Data Analytics Certificate.
It comprises a total aggregated dataset with 13 columns and 6,389,601 rows, containing attributes such as rider identification number, bike type, starting and ending latitude and longitude, rider status (casual or member), and start and end station identification.
ROCCC Test Evaluation:
ETL (Extract, Transform, and Load) - PREPARE and PROCESS
Data download from 2023 to 2024.
Pentaho Spoon ETL for ingesting data into a PostgreSQL database.
??Why did I use Pentaho Data Integration?
// The Haversine formula
var start_lat_in_rad = start_lat * Math.PI / 180;
var start_lng_in_rad = start_lng * Math.PI / 180;
var end_lat_in_rad = end_lat * Math.PI / 180;
var end_lng_in_rad = end_lng * Math.PI / 180;
var lat_delta = end_lat_in_rad - start_lat_in_rad;
var lng_delta = end_lng_in_rad - start_lng_in_rad;
var a = Math.pow(Math.sin(lat_delta / 2), 2) + Math.cos(start_lat_in_rad) * Math.cos(end_lat_in_rad) * Math.pow(Math.sin(lng_delta / 2), 2);
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
var R = 6371; // Radius of the Earth in kilometers
var ride_length_in_km = R * c; // Distance between the two geographic points
// ---------------------------------------------------------------------------------
if (s_hour >= 0 && s_hour < 12) {
var s_time_of_day = "Morning";
} else if (s_hour >= 12 && s_hour < 18) {
var s_time_of_day = "Afternoon";
} else if (s_hour >= 18 && s_hour < 22) {
var s_time_of_day = "Evening";
} else {
var s_time_of_day = "Late Night";
}
// ---------------------------------------------------------------------------------
if ((s_month >= 1 && s_month <= 2) || s_month == 12) {
var s_season = "Winter";
} else if (s_month >= 3 && s_month <= 5) {
var s_season = "Spring";
} else if (s_month >= 6 && s_month <= 8) {
var s_season = "Summer";
} else {
var s_season = "Autumn";
}
// ---------------------------------------------------------------------------------
if (started_at == null) {
var s_date = null;
} else {
var dateString = started_at.toString();
var s_date = dateString.split(" ")[0];
}
// ---------------------------------------------------------------------------------
var chicagoBoundingBox = {
north: 42.023,
south: 41.644,
east: -87.524,
west: -87.940
};
function isInChicago(latitude, longitude) {
return latitude >= chicagoBoundingBox.south &&
latitude <= chicagoBoundingBox.north &&
longitude >= chicagoBoundingBox.west &&
longitude <= chicagoBoundingBox.east;
};
var latitudeStart = start_lat;
var longitudeStart = start_lng;
var latitudeEnd = end_lat;
var longitudeEnd = end_lng;
if (isInChicago(latitudeStart, longitudeStart) && isInChicago(latitudeEnd, longitudeEnd)) {
var ride_in_chicago = true;
} else {
var ride_in_chicago = false;
}
Importing data into Power BI via PostgreSQL Connection.
Total Unique Rides in Chicago = DISTINCTCOUNT(Cyclistic[ride_id])
Total Stations = DISTINCTCOUNT(Cyclistic[start_station_id])
Average Ride Length in Minutes = AVERAGEX(Cyclistic, DIVIDE(Cyclistic[ride_length_in_seconds], 60))
Power BI Report - ANALYZE
Project Design:
Power BI Report - Pages and Interactions:
领英推荐
Analysis and Insights
?? Global View:
??Member View:
??Casual View:
??Insights:
??Total Rides:
??Bike Type:
??Average Ride Length:
??TOP Stations and Rides - From the top 10 I will highlight the first 5:
??Distribution of rides by time:
??Distribution of rides by days:
??Distribution of rides by months and seasons:
??Distribution of Average Ride Length:
?? During the weekdays, member riders take more rides compared to weekends when casual riders take more rides.
?? Another important point is that around 30% of users use bicycles to commute to work daily, according to the company.
Decision-making - ACT
Among the various insights that could be generated, taking the two above as a reference, we could use digital media in the following way:
?? These steps will help the company determine if it can convert casual riders into members.
#Google #PowerBi #Pentaho #DataAnalytics #GoogleDataAnalyticsCertificate #JuniorDataAnalytics #DataDrivenDecisions #BikeShare #CapstoneProject #Cyclistic #Coursera #DataVisualization