Procurement Innovation with Machine Learning!

Procurement Innovation with Machine Learning!

My motivation is to create procurement managers as data scientists and vice versa! Join in! (Link to previous chapters will be at the end of this post)

Chapter 1 — Day 2

(Noob Day)

Dimensionality Reduction

Dimensionality Reduction is a subfield of unsupervised learning.

With the procurement problem statement, we often deal with data with high dimensionality. In a simpler context, it means each data field comes up a high number of its own measurement (or properties).

Higher the dimensionality slower will be the computational performance of our machine learning algorithm. Unsupervised Dimensionality Reduction is a common approach in feature preprocessing. It helps to do the following

1.) Remove noise from the data. Noise in the data can also degrade the predictive performance of the algorithm

2.) Compress the data onto a smaller dimensional subspace while retaining most of the relevant information

3.) It can also be useful for visualizing data. For example, 6-dimensional data can easily be visualized in 3 dimensions.


Basic Terminology & Notations

Let's begin learning by doing. Step 1 exercise in any machine learning is journey is playing with Iris Dataset.

Iris data set is like Hello World of programming languages. The Iris dataset contains the measurements of 3 different species — Setosa, Versicolor and Virginica.


Flower measurements are stored in the columns (also called as features) of the dataset. The measurements are in centimeters.

We will use matrix and vectors notation to refer to our dataset from now. Each sample will be represented as a seperate row in a feature matrix X. Each feature is stored as a separate column.

So, X belong to R 150x4


Roadmap for building machine learning models

There are 3major components of building a machine learning model.

a.) Preprocessing: Its all about getting the data into the right shape

This is one of the most crucial steps in any machine learning model. Our objective is to churn out meaningful features from the raw data set.

In preprocessing, we clean the data first. By cleaning the data, I mean the following:

(i) Removing erroneous values

(ii) Removing blank values

(iii) Normalizing the ranges: Transforming the values in the range of [0,1].

(iv) Ignoring the outliers

(v) Ensuring the data is correctly labeled

(vi) Remove highly correlated and redundant data

This is by far not an exhaustive list.

Therefore, Dimensionality Reduction techniques are useful here to compress the features into lower dimensional subspace. (Also read about signal-to-noise ratio).

After cleaning of the data, we divide our dataset into two parts

(i) Training Dataset:

Training dataset is used to build and train our machine learning model.

For example, if we are doing regression analysis, the model will learn

(ii) Testing Dataset:

Test dataset is used to evaluate our final model

Often, the split is done random division basis.

b.) Learning (Training):

There are many different machine learning algorithms. However, selection of algorithm depends upon many factors including business case itself.

In practice, we compare different algorithms in order to train and select the best performing model. However, we must be clear in terms of how are we going to measure the results (and performance). One commonly used metric is accuracy. Accuracy is defined as a proportion of correctly classified instances.

Each algorithm comes with own set of setting parameters, also called as Hyperparameters. There are default settings, to begin with, but we change these hyperparameters according to the performance of our algorithm.

c.) Evaluation and Prediction:

After we finalize the best performing algorithm, we then use our test dataset to estimate how well it performs on the unseen data to estimate the error percentage. Once we are satisfied with this error percentage, we can then use this algorithm to predict new data.

Important Python Packages for Machine Learning

We will be using python language for this series as it is the most popular language around. We will be using the following libraries

1.) Scikit-learn

2.) Numpy

3.) Scipy

4.) Matplotlib

5.) Pandas

This marks the end of chapter 1.

In Chapter 2, we will start with the implementation of a classificationalgorithm and the perceptron.

See you tomorrow!

Link of Chapter -1 Day -1 :https://www.dhirubhai.net/pulse/procurement-innovation-machine-learning-gaurav-sharma/

Note: I am using Python machine learning book written by Sebastian Paschka and Vahid Mirjalili for this series.

要查看或添加评论,请登录

Gaurav Sharma的更多文章

  • Procurement Innovation with Machine Learning!

    Procurement Innovation with Machine Learning!

    My motivation is to create procurement managers as data scientists and vice versa! Join in! Chapter 1 — Day 1 (Noob…

  • Procurement Beyond ERPs : A Choice

    Procurement Beyond ERPs : A Choice

    In the environment of high volume procurement (such as commodity procurement), one particular factor clearly stands…

  • Procurement Beyond ERPs - Velocity

    Procurement Beyond ERPs - Velocity

    Day 2 : Invent-ories! and concept of "Velocity" Is there a better way to manage inventories? The methods of calculation…

    5 条评论
  • Procurement Beyond ERPs

    Procurement Beyond ERPs

    Day 1 : Cost savings in Commodity Procurement : #Cent-i-meter I have seen two types of organizations. One, where the…

    1 条评论
  • List of Common Machine Learning Algorithms - Algo-1/Week-1/Day-1

    List of Common Machine Learning Algorithms - Algo-1/Week-1/Day-1

    Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data…

  • Africa - The Next Bright Spot & Supply Chain Issues Of The EU In 2050

    Africa - The Next Bright Spot & Supply Chain Issues Of The EU In 2050

    Have a look at the picture above! I happen to read an article on Global Post's website. This map shows where the…

  • Innovation - Cultivation & Capitalization

    Innovation - Cultivation & Capitalization

    I have been actively involved in reading Harvard case studies out of my personal interest. Below is some of the insight…

    1 条评论

社区洞察

其他会员也浏览了