ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Procurement Innovation with Machine Learning!

Gaurav Sharma

I love Ai and Procurement. Also, everything around Spend Analysis, Negotiations and Digital Procurement

å‘å¸ƒæ—¥æœŸ: 2018å¹´10æœˆ5æ—¥

My motivation is to create procurement managers as data scientists and vice versa! Join in! (Link to previous chapters will be at the end of this post)

Chapter 1 â€” Day 2

(Noob Day)

Dimensionality Reduction

Dimensionality Reduction is a subfield of unsupervised learning.

With the procurement problem statement, we often deal with data with high dimensionality. In a simpler context, it means each data field comes up a high number of its own measurement (or properties).

Higher the dimensionality slower will be the computational performance of our machine learning algorithm. Unsupervised Dimensionality Reduction is a common approach in feature preprocessing. It helps to do the following

1.) Remove noise from the data. Noise in the data can also degrade the predictive performance of the algorithm

2.) Compress the data onto a smaller dimensional subspace while retaining most of the relevant information

3.) It can also be useful for visualizing data. For example, 6-dimensional data can easily be visualized in 3 dimensions.

Basic Terminology & Notations

Let's begin learning by doing. Step 1 exercise in any machine learning is journey is playing with Iris Dataset.

Iris data set is like Hello World of programming languages. The Iris dataset contains the measurements of 3 different species â€” Setosa, Versicolor and Virginica.

Flower measurements are stored in the columns (also called as features) of the dataset. The measurements are in centimeters.

We will use matrix and vectors notation to refer to our dataset from now. Each sample will be represented as a seperate row in a feature matrix X. Each feature is stored as a separate column.

So, X belong to R 150x4

Roadmap for building machine learning models

There are 3major components of building a machine learning model.

a.) Preprocessing: Its all about getting the data into the right shape

This is one of the most crucial steps in any machine learning model. Our objective is to churn out meaningful features from the raw data set.

In preprocessing, we clean the data first. By cleaning the data, I mean the following:

(i) Removing erroneous values

(ii) Removing blank values

(iii) Normalizing the ranges: Transforming the values in the range of [0,1].

(iv) Ignoring the outliers

(v) Ensuring the data is correctly labeled

(vi) Remove highly correlated and redundant data

This is by far not an exhaustive list.

Therefore, Dimensionality Reduction techniques are useful here to compress the features into lower dimensional subspace. (Also read about signal-to-noise ratio).

After cleaning of the data, we divide our dataset into two parts

(i) Training Dataset:

Training dataset is used to build and train our machine learning model.

For example, if we are doing regression analysis, the model will learn

(ii) Testing Dataset:

Test dataset is used to evaluate our final model

Often, the split is done random division basis.

b.) Learning (Training):

There are many different machine learning algorithms. However, selection of algorithm depends upon many factors including business case itself.

In practice, we compare different algorithms in order to train and select the best performing model. However, we must be clear in terms of how are we going to measure the results (and performance). One commonly used metric is accuracy. Accuracy is defined as a proportion of correctly classified instances.

Each algorithm comes with own set of setting parameters, also called as Hyperparameters. There are default settings, to begin with, but we change these hyperparameters according to the performance of our algorithm.

c.) Evaluation and Prediction:

After we finalize the best performing algorithm, we then use our test dataset to estimate how well it performs on the unseen data to estimate the error percentage. Once we are satisfied with this error percentage, we can then use this algorithm to predict new data.

Important Python Packages for Machine Learning

We will be using python language for this series as it is the most popular language around. We will be using the following libraries

1.) Scikit-learn

2.) Numpy

3.) Scipy

4.) Matplotlib

5.) Pandas

This marks the end of chapter 1.

In Chapter 2, we will start with the implementation of a classificationalgorithm and the perceptron.

See you tomorrow!

Link of Chapter -1 Day -1 :https://www.dhirubhai.net/pulse/procurement-innovation-machine-learning-gaurav-sharma/

Note: I am using Python machine learning book written by Sebastian Paschka and Vahid Mirjalili for this series.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Gaurav Sharmaçš„æ›´å¤šæ–‡ç«

Procurement Innovation with Machine Learning!

2018å¹´10æœˆ4æ—¥

Procurement Innovation with Machine Learning!

My motivation is to create procurement managers as data scientists and vice versa! Join in! Chapter 1 â€” Day 1 (Noobâ€¦
Procurement Beyond ERPs : A Choice

2018å¹´9æœˆ6æ—¥

Procurement Beyond ERPs : A Choice

In the environment of high volume procurement (such as commodity procurement), one particular factor clearly standsâ€¦
Procurement Beyond ERPs - Velocity

2018å¹´9æœˆ5æ—¥

Procurement Beyond ERPs - Velocity

Day 2 : Invent-ories! and concept of "Velocity" Is there a better way to manage inventories? The methods of calculationâ€¦

5 æ¡è¯„è®º
Procurement Beyond ERPs

2018å¹´9æœˆ4æ—¥

Procurement Beyond ERPs

Day 1 : Cost savings in Commodity Procurement : #Cent-i-meter I have seen two types of organizations. One, where theâ€¦

1 æ¡è¯„è®º
List of Common Machine Learning Algorithms - Algo-1/Week-1/Day-1

2017å¹´4æœˆ9æ—¥

List of Common Machine Learning Algorithms - Algo-1/Week-1/Day-1

Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any dataâ€¦
Africa - The Next Bright Spot & Supply Chain Issues Of The EU In 2050

2017å¹´3æœˆ30æ—¥

Africa - The Next Bright Spot & Supply Chain Issues Of The EU In 2050

Have a look at the picture above! I happen to read an article on Global Post's website. This map shows where theâ€¦
Innovation - Cultivation & Capitalization

2015å¹´6æœˆ22æ—¥

Innovation - Cultivation & Capitalization

I have been actively involved in reading Harvard case studies out of my personal interest. Below is some of the insightâ€¦

1 æ¡è¯„è®º

See all articles

Procurement Innovation with Machine Learning!

Gaurav Sharma

I love Ai and Procurement. Also, everything around Spend Analysis, Negotiations and Digital Procurement

Chapter 1 â€” Day 2

Dimensionality Reduction

Basic Terminology & Notations

Roadmap for building machine learning models

Important Python Packages for Machine Learning

Gaurav Sharmaçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Understanding XGBoost: A Powerful Machine Learning Algorithm

How I Fell In Love With Machine Learning

Top 9 Machine Learning Algorithms Every Data Scientist Should Know

Machine Learning Topic 6: Overfitting and Underfitting in Machine Learning: A Clear Explanation with Examples and Techniques

Challenges in Machine Learning: Understanding Real-World Hurdles

A Look Into Snorkel DryBell: Googleâ€™s Machine Learning Model that Labels Data by Learning About Your Organization

Machine Learning Life Cycle

Machine Learning Lifecycle

What is Automated Machine Learning and How It Works?

Demystifying Classification in Machine Learning: Concepts, Techniques, and Evaluation Metrics

Chapter 1 â€” Day 2

Dimensionality Reduction

Basic Terminology & Notations

Roadmap for building machine learning models

Important Python Packages for Machine Learning

Gaurav Sharmaçš„æ›´å¤šæ–‡ç«