Google Data Analyst Capstone Project: How does a bike-share navigate speedy success?
Julian Lim
I work in a Financial Institution | Customer Success Professional with skills in Data Analytics | Autodidact | Lifelong learner
Hi everyone, these are the things I encountered while analysing data for my Google Data Analyst Practitioner Certificate. The Capstone project is where you put all of your skills learnt during the course together. It's the first time you get to analyze data. So here are some insights!
Source Data:
I had to analyze a year's worth of bike ride data. There are 12 csv files, 1 for each month. I choose June 2022 to May 2023. The data was quite sizeable.
For realistic estimates sake, when combined (using R Desktop), the combined file was over 1.3 gigabytes in size. There are over 5 million rows of data. That's a lot of data. Each row represents a bicycle ride.
What I did:
I looked at the data via Excel, and tried to get a feel of it. I attempted to combine it into a data source, using Microsoft Power Query: (for those of you, it's a feature in Microsoft Excel and PowerBI). However, the long loading times and times my laptop hung meant that I needed something a little more reliable. I decided to use R Desktop to manage my data.
Using R-Desktop
After reading thru a few other articles written by my coursemates who have completed their projects, I decided to try it out on my own to combine the data into 1 CSV file, and to clean up and filter out any abnormalities.
Here are the major steps I took in R Desktop:
领英推荐
I am largely finished with R Desktop at this point in time, so I decided to try MS Power Query on Excel, and using Pivot Tables for visualizations and data viewing. The reason why I decided to use MS Excel and Power Query is because, we are all familiar with Excel, and with Excel, even if we are limited to about a million rows, Pivot Tables are still very useful.
Using Excel and Power Query
Now that the csv file is combined (after 15 minutes of loading, on my old laptop), we will proceed to get the data. I'm showing you the button that is used to get the data from our combined csv file.
And now, we'll use Insert Pivot Table to add in our data, and manipulate what you want to see.
Once you have reached the stage of Pivot Tables, you are able to create Pivot Charts from them, and you should be able to manipulate and present your findings.
I also referred to and 'discussed' my process with ChatGPT and also read other examples of accounts that are published.
Good luck!
PRINCE2?| PMD-Pro| Assistant Manager - Agricultural Value Chain Production, Integration and Services at NIRSAL Connect
11 个月Congratulations ....... This will be helpful for my capstone project
Question to question more. Ai Integration Advocate | Transcultural Coach Posting Daily Provocations
1 年Congratulations