An Introduction To Data Preprocessing With Python
Shivek Maharaj
Data Analyst | Automation Architect | Business success doesn’t follow a blueprint, It follows me | AI Engineer
Both supervised and unsupervised machine learning methods have already been discussed.
To begin the training process for these algorithms, prepared data is needed. To provide data as an input to ML algorithms, we must prepare or structure it in a specific way.
The preparation of data for machine learning algorithms is what we will discuss over the next few articles.
Data Preprocessing
We deal with a lot of data in our daily lives, yet it is in raw form. We must transform the data into useful data before supplying it to the machine learning algorithms. Data preparation is useful in this situation. In plainer language, we might say that preprocessing the data is necessary before submitting it or sending it to the machine learning algorithms.
Data Preprocessing Phases
To preprocess data for Machine Learning algorithms, we may utilize the following framework.
STEP 1: IMPORT THE NECESSARY PACKAGES
This is the first phase of preprocessing or turning the data into a specific format. With Python Programming Language the process is as follows:
import numpy as np
from sklearn import preprocessing
Looking at the above code, we are able to see that we are making use of two packages to facilitate the process of data preprocessing.
STEP 2: PROCURING AND DEFINING THE DATA
We must first define some sample data before importing the packages in order to preprocess the data. The following example data will now be defined:
data = np.array([[50, 40, 23],
[49, 12, 37],
[19, 35, 44]])
STEP 3: APPLYING THE PREPROCESSING TECHNIQUE
The next few articles will introduce you and show you how to perform various different types of preprocessing techniques on data.
To be specific, a few of the data preprocessing techniques we will look at include the following: