Exploratory Data Analysis (EDA)
Visualizing the data

Exploratory Data Analysis (EDA)

What is EDA ?

EDA is one of the crucial step in data science that allows us to achieve certain insights and statistical measure that is essential for the business continuity, stockholders and data scientists. You may have to find if the data has integrity and values make sense, have people reported data on different scales, are their missing values over there. Do some columns have outliers, are there datasets with multiple modes, what is the distribution of values, how features correlate with one another and so on.

Why is EDA so important ?

It performs to define and refine our important features variable selection, that will be used in our model. Explorative Data Analysis is a process where one learns about the data, forms insights and identifies important columns (features) that can be user to tell a story or later formulate a ML problem.

Procedure for performing EDA:

EDA involves 4 steps

Those are

  1. Data Collection
  2. Data Cleaning
  3. Data Preprocessing
  4. Data Visualization

1.Data Collection

Data collection is the process of gathering information in an established systematic way that enables one to test hypothesis and evaluate outcomes easily.

2.Data Cleaning

Data cleaning is the process of ensuring that your data is correct and useable by identifying any errors in the data, or missing data by correcting or deleting them. 

3.Data Preprocessing

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. It includes normalization and standardization, transformation, feature extraction and selection, etc. The product of data preprocessing is the final training dataset.

4.Data Visualization

Data visualization is the graphical representation of information and data. It uses statistical graphics, plots, information graphics and other tools to communicate information clearly and efficiently.

Here I have taken a "Hotel Bookings" dataset. EDA was performed on this data. pshivapadmaja1/EDA (github.com) This is the git hub link for EDA processing.


要查看或添加评论,请登录

Penmetsa Shiva Padmaja的更多文章

社区洞察

其他会员也浏览了