Exploratory data analysis
Introduction
1) What is Exploratory data analysis ?
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.
2) Why do we use Exploratory data analysis?
To analyze and investigate the data set and summarize the main characteristics of features using visualization techniques we need EDA.
3) How do we use EDA?
Steps in Data Exploration and Preprocessing:
- Identification of variables and data types.
- Analyzing the basic metrics.
- Non-Graphical Univariate Analysis.
- Graphical Univariate Analysis.
- Bivariate Analysis.
- Variable transformations.
- Missing value treatment.
- Outlier treatment.
Let's see our Data Analysis. I have used python for the analysis.
We will first import all the necessary libraries and import our dataset in our systems for further work.
Missing Values:-
If missing values are present in the Dataset before we perform any statistical analysis, we need to handle those missing values.
There are mainly three types of missing values.
- MCAR(Missing completely at random): These values do not depend on any other features.
- MAR(Missing at random): These values may be dependent on some other features.
- MNAR(Missing not at random): These missing values have some reason for why they are missing.
Let’s see which columns have missing values in the dataset.
Here we can see that we have all the columns having missing value
Drop the missing Values:-
Let’s handle missing values in the Selling_Price column.
Now we can see we have successfully drop the missing value from Selling_Price column.
Statistical Insight:-
This step should be performed for getting details about various statistical data like Mean, Standard Deviation, Median, Max Value, Min Value.
Numeric-Numeric Analysis:-
Analyzing the two numeric variables from a dataset is known as numeric-numeric analysis. We can analyze it in three different ways.
- Scatter Plot
- Pair Plot
- Correlation Matrix
Scatter Plot:-
Let’s take two columns ‘Selling_Price’ and ‘Present_Price’ from our dataset and see what we can infer by plotting to scatter plot between Selling_Price and Present_Price
Pair Plot
Now, let’s plot Pair Plots for the three columns we used in plotting Scatter plots. We’ll use the seaborn library for plotting Pair Plots.
7. Histogram:-
We draw a histogram for Selling_Price column. The code is straightforward via plot.hist.
8. Bar Chart:-
Line Plot:-
Conclusion
This is how we’ll do Exploratory Data Analysis. Exploratory Data Analysis (EDA) helps us to look beyond the data. The more we explore the data, the more the insights we draw from it. As a data analyst, almost 80% of our time will be spent understanding data and solving various business problems through EDA.