An Introduction To Data Preprocessing With Python

An Introduction To Data Preprocessing With Python

Both supervised and unsupervised machine learning methods have already been discussed.

To begin the training process for these algorithms, prepared data is needed. To provide data as an input to ML algorithms, we must prepare or structure it in a specific way.

The preparation of data for machine learning algorithms is what we will discuss over the next few articles.

Data Preprocessing

We deal with a lot of data in our daily lives, yet it is in raw form. We must transform the data into useful data before supplying it to the machine learning algorithms. Data preparation is useful in this situation. In plainer language, we might say that preprocessing the data is necessary before submitting it or sending it to the machine learning algorithms.

Data Preprocessing Phases

To preprocess data for Machine Learning algorithms, we may utilize the following framework.

STEP 1: IMPORT THE NECESSARY PACKAGES

This is the first phase of preprocessing or turning the data into a specific format. With Python Programming Language the process is as follows:

import numpy as np
from sklearn import preprocessing        

Looking at the above code, we are able to see that we are making use of two packages to facilitate the process of data preprocessing.

  • NumPy: NumPy, in its simplest form, is a general-purpose array-processing toolkit created to quickly manipulate tiny multi-dimensional arrays of data without compromising too much processing speed for large multi-dimensional arrays of records.
  • Sklearn.Preprocessing: To transform unprocessed feature vectors into a form better suited to machine learning algorithms, this package offers a wide variety of common utility functions and transformer classes.

STEP 2: PROCURING AND DEFINING THE DATA

We must first define some sample data before importing the packages in order to preprocess the data. The following example data will now be defined:

data = np.array([[50, 40, 23],
                [49, 12, 37],
                [19, 35, 44]])        

STEP 3: APPLYING THE PREPROCESSING TECHNIQUE

The next few articles will introduce you and show you how to perform various different types of preprocessing techniques on data.

To be specific, a few of the data preprocessing techniques we will look at include the following:

  • Binarization
  • Mean Removal
  • Min-Max Scaling
  • Normalization
  • Label Encoding

要查看或添加评论,请登录

Shivek Maharaj的更多文章

社区洞察