Journey to Data Scientist

Journey to Data Scientist

I’ve decided to become a DATA SCIENTIST. I am extremely excited to start this journey of learning and exploration. I have been using Tableau Desktop for over 2 years in order to explore data and have come across other data exploration tools such as R Studio that have peaked my interest.

I’ve taken a few courses on Lynda.com to learn more about what data science is and what it isn’t. From my understanding now, data science is using the scientific method to explore data and ask interesting questions to pull insights. Knowledge of mathematics and statistics is helpful in this field. I personally always loved statistics and have a deep interest in learning how to leverage my knowledge for data exploration.

Why am I so fascinated with data science? I always liked getting a fresh set of data and trying to see what it’s telling me; especially using unstructured data that appears complex. I thrive on being able to pull helpful and interesting insights from data. I also believe that data science will hold a special spot as a career in the future. With the amount of data that is collected growing exponentially, data scientists will be in high demand to make sense of it all. The vast majority (about 80%) of data collected is unstructured; for me, that is even more fun as a future data scientist! I am currently working full-time as part of an insights development team, raising my two daughters. I am setting a personal goal to dedicate time into my schedule for learning everything I can about becoming a data scientist!

The best way to learn data science and showcase your skills is by doing some actual projects – we learn best by doing. 

So, how do we choose a project to work on? Where do we start? One way to approach it is to first look at some career websites and find a few jobs in data science that you aspire to have in the future. Write down the skills, qualifications, day-to-day expectations, and overall job description from the jobs that interest you. This will give you the project “requirements” that you can work with to formulate a project.

The role I found is Analytics Data Scientist from Honeywell, the full job description can be found here. I pulled out some specific job requirements that we can use to structure our project.

  • Experience with modeling software – R
  • Experience bringing prototypes to production on Hadoop
  • Experience with visualization software – Tableau
  • Understanding of data science trends and future state technologies
  • Provide data modeling, mining, pattern analysis, data visualization and machine learning solutions to address customer needs
  • Communicate project output in terms of customer value, business objectives, and product opportunity
  • Attend industry conferences to stay current on industry trends, challenges, and potential market opportunities

The last item on the list won’t really be used in the project but it’s something that you can do in your free time to learn more about data science and to expand your professional network.

Now what we need is a data set we can use to execute our project. As you execute on your project it is important to record it in a way that people can read and see your thought-process. It should highlight your data science skills (programming, data cleansing, visualization, etc).

The website I used to find my data set is Data.world – I downloaded the “global super store 2016 xlsx” onto my desktop. We will focus on the first tab “Orders”. The data has 51,290 rows, and 24 columns. The description of the data is as follows:

“Global Super Store is a data set which has around 50,000 values. Its a customer-centric data set , which has the data of all the orders that have been placed through different vendors and markets , starting from the year 2011 till 2015.”

We will use the following process-flow to guide us in the project:

  1. Understand the Problem/ Ask Interesting Questions
  2. Data Gathering & Preparation
  3. Data Analysis
  4. Data Visualization & Communication

The first step talks about asking the right questions, I created a post about this recently, 20 Questions to Ask Prior to Starting Data Analysis. Since this is a made-up project we need to create a scenario that can help us relate it to real life. Let’s say we are hired as a consulting partner to help out a client “Global Superstore”. Below are the problems they are facing and need your help with:

  • In order to increase our sales, we want to hire more salespeople. In which location should we invest to build up our sales team?
  • We want to show our top 10 customers our appreciation by sending them a gift card that has $500 in store credit. Which customers should we send it to?
  • We want to decrease the number of products that we offer, which products should we discontinue?

Now we need to think about who our audience is, in this case let’s pretend we need to show our analysis results to the organizations senior management. This means the data needs to be communicated in a way that will suit their preferences. We ask our main point of contact in the firm and they say that senior management loves having drill down capabilities, and filtering on data. Given their preference, we decide to use Tableau as the medium for presentation.

The second step is to gather data and prepare data for analysis. We already have the data set but one thing we can do is determine if we need any additional data at this point to move forward. Since all of the data that was requested by the client is already in the data source, we don’t need to gather additional data. Let’s analyze if any data cleansing/ preparation needs to take place.

By quickly skimming the data in Excel, we can see that there are a few blanks in the data. One way to clean this up is to:

  1. Select the entire data set
  2. Press Ctrl +F to Find. Then use the Find and Replace All to find blank and replace with Not Available or Null.

Other than the blanks, the data looks clean and ready to use. Now we can save the file and begin our analysis.

Objective: Using the data set provided, along with the project requirements – start your data analysis using some of the tools that are available to you (for example R and Tableau Public are free to use). The goal is to answer the clients questions with a Tableau presentation that will have filtering and drill-down capabilities.

Join me on this project- let’s learn data science together!

Note: Make sure you record your thought process, steps you take, output you produce, etc – basically show your work. This will help you solidify what you are learning and to have something to put into your data science project portfolio. You can post your results of the analysis in comments below or on our Facebook page: Story by Data


要查看或添加评论,请登录

社区洞察

其他会员也浏览了