Importance of Data Exploration & Business Context
You will find a lot of people bragging about their skills in the field of data and you will see them using fancy words and techniques.
One thing I feel they all miss is telling everyone about the basics involved in all of this which is data exploration and getting the context behind each data set.
So what is data exploration?
Data exploration is the first step involved in any data analysis project be it so advanced I question myself on my life choices. In this step you get to know about the data and how things are related to each other. What do the attributes mean and what are they representing.
How to do data exploration?
This mainly depends on the amount of data you are using. If I tell about myself I always start with the very basic which is MS Excel. I kid you not but you do not know how powerful MS Excel is if the data is not in a huge amount. MS Excel is like the best player you know in your area and if there is a match in your local area you call that player.
Other than that you can use SQL if you have the data on your servers/databases or you can use Python for an even more granular look at the data. The technique of data exploration in Python is called EDA (Exploratory Data Analysis). Another fancy word I have added to your dictionary.
What steps do I follow for data exploration in MS Excel?
They are pretty simple and I guess it will help you.
领英推荐
When I get the data for the first time, I try to know about the context behind every attribute/column involved in the data set. If you do not know what each column represents then there is no point in doing anything further because if you do not know which columns you will be using to solve the business case you have, what is the point of knowing all the fancy techniques.
Its like stopping at every shop to buy boots because you do not know any of the shops. If you know the business case and the columns that will be involved in that business case then you would directly select those columns only and complete the task rather than using whole of the data where only 2 columns are involved. It may look like not a hurdle if the data is small but if you have hundreds of columns with thousands of rows you are doomed.
Once I know about the data, I go forward and form the relations between the attributes in my head so that I know how the flow of the data is.
Then I use the magical keyboard shortcuts to know what is the shape of the data, how many null values do I have. What are the mean median values of numerical columns involved in my business case. Also not to forget to adding borders and making your column headers bold, because everyone likes beauty.
So after doing this exercise you will have a general idea about the data and the columns you will have to use to solve the business case. You know the shape of the data so you also know what are the tools you will be using accordingly.
Hope this helps. A pretty raw idea and explanation but everyone likes raw steaks more than well done, right? Oh you like well done, no problem we have space for unnatural beings in the world.
Basharat, a learning analyst, signing off.
Data & AI | Solutions | Content | Trainings | Hacks
1 年The data field has a lot of fancy words. Settle with the basics first.