登录查看更多内容

AN INTRODUCTION TO MULTIPLE LINEAR REGRESSION IN ML

Shanti A

Data Scientist & Tech Head at Learnbay | Transforming Businesses through Advanced Analytics.

发布日期: 2021年8月2日

The first step of learning machine learning algorithms is understanding the different regression techniques. Being the simplest one linear regression is always the first learning target of data science aspirants. But do you know the entire technique of linear regression is not that simple. Rather, linear regression sometimes needs multiple variable assessment. Such a complex level statistical approach is Multiple Linear Regression in ML. In this blog we will explore the basics of this second level linear regression technique.

What does MLR mean in machine learning (ML)?

Sometimes, in a regression ML problem (MLR), the numbers of self-governing (independent) variables become two or higher than two, and the conditional (dependent )variable becomes one. In such cases, to evaluate the interrelationship between self-governing and conditional variables, you need to apply the special linear regression techniques termed ‘multiple linear regression or MLR.’

Is MLR conducted (supervised) or unsupervised?

Multiple linear regression falls under the subcategory of conducted machine learning algorithm category. Such a category of linear regression helps in featuring the change of self-governing variable with the simultaneous alternation of conditional variables.

What are the real-life examples of linear regression?/ Under what circumstances can we use linear regression?

Here, I have used three simple examples to make you understand the scenarios where you need to use the Ml techniques of MLR.

Example#1

Suppose a patient has started consulting a psychiatrist. Now his counselling process and the medical treatment will depend on the several following factors.

Family environmentProfessional environment
A current situation, like any relationship issues
Family background concerning mental health
For how long the patient is living in a stressful situation.
Existence of chronic diseases that might impact mental health, etc.

Example#2

Suppose, after completion of your data science course, you have got your first data scientist job. For this job, the salary package will be dependent on the following measures.

Your last drawn salary
Ear of experience
Domain knowledge.
Soft skills like communication, report presentation, convincing skills, time management ability, etc.

Example#3

Suppose your company is going to sell a few of the shares for a very popular product. Now, the selling price of the share will depend on the following factors.

The investment you did during it’s the promotion
Prices of its competitor’s product
It’s popularity percentage in the market.
It’s future demands, etc.

What is the mathematical expression of MLR?

Yn= b0+b1??1n+b2??2n+?+b??????n+??n

Here,

B0 is the intercept of the line.

b1, b2, b3……bk are the regression coefficients associated with the sele-governing variables, ??1n, ??2n, ??3n, ……??kn.

??n is the error term. Alternatively, you can say it’s the residuals of the MLP technique.

X is the explanatory variable.

The above mathematical expression that we use of MLR is termed as the?‘response surface function’.?Another name of this function is the?‘hyperplane function.’

How does Simple and Multiple regression differ from each other?

The difference between these two types of regression can be carried out based on two components.

Relationship between the variables.
The observed value of Yn.

Let’s have a look at the difference.

Based on the relationship between the variables:

In simple regression, a regression line drawn throughout a scatterplot can effectively indicate the inter-relationship between two variables.

On the contrary, to visualise the inter-relationships between the Yn and all the self-governing variables, we need to use multiple regression planes.

Based on the observed value of Yn

In simple regression, to find out the best-fit ML model, you need to consider the least-squares method. Using this method, you need to compare the predicted and perceived value of Yn.

On the other hand, in multiple linear regression techniques, the comparison takes place between the perceived values of Y scattered around the regression plane and the related points on the same plane concerning the least square criterion.

What are the five assumptions of multiple linear regression?

Assumption#1

The explanatory variable, X, is non-stochastic.

Assumption#2

The implicit expected value of the residual that is E (????/????) remain null. And any kind of variance in the value of E, i.e. var E is universally constant for all of the ???? values (homoscedasticity).

Assumption#3

When working with time-series data, then no correlation exists between the residuals. Mathematically, we can say, for all the i≠j, Cov(????, j)=0, and ???? never deviates from the pattern of normal distribution.

Assumption#4

Multicollinearity doesn’t exist in the case of multiple linear regression.

Assumption#5

In terms of regression parameters, the regression model is linear.

A 9-steps Guide for building and MLR Framework

Step#1: Extraction of information

First of all, as per the recognised issues, you need to collect the required information (data) from various data resources.

Step#2: Pre-processing of the collected informationIt’s crucial to ensure data quality by checking for concerns like data reliability,?completeness,?utility, accuracy, missing data, and?outliers.

To maintain the data completeness, sometimes, you may need to deploy some dummy variables obtained from the conversion of several?qualitative?variables before the initiation of the data analysis.

If you need to come up with a new variable, you can consider variable transformations (X2/X1). In case the original variable is missing from your dataset, you can opt for?proxy variables.

Step#3: Implementation of descriptive analysis

Before the building of analytics modelling, you need to carry out the descriptive analysis. You’ll land on a clear idea about the best-fit model, data visualisation, and insight generation criterion through such analysis.

Suppose your problem needs more concentration on the outliner information, then you need to go with a box plot, while for highlighting the inter-relationship between several variables, scatterplot becomes the best option.

Step#4: Strategy simulation

When your datasets consist of thousands of considerable variables, you need to deploy data compression tactics to ease the process of regression analysis. Unfortunately, few of such tactics are stepwise regression or backward elimination, even factor analysis like PCA.

领英推荐

Introduction to Simple Linear Regression in Machine…

Learnbay 2 年前

ML Algorithms You Can Use to Predict & Classify

Infiniticube 3 个月前

Choosing the Right Machine Learning Algorithm: A…

Doug Rose 1 个月前

Step#5: Segregate the data into training and validation sets

Roughly 85% of the information is intended for workouts that are produced with random sampling. Training data is therefore utilised to construct the model and validation data to validate and select models.

Segregation of data aims to identify the best-fit ML model through efficient data enforcement and data validation. But keep in mind there are no scopes of data over-fitting to ensure that model deployment is completely error-free.

If required, you need to segregate the data into three subsets of

Training
Validation
Testing

Step#6: Identification of the functional entitySpecify the appropriate functional arrangements between the variables.

Step#7:?Identification of the estimated parameters of the regression

At this step, you need to apply?OLS?techniques that offer the best-fit regression line through the data set points. The use of OLS helps you to land on the?Best Linear Unbiased Estimate (BLUE), the mathematical expression which is as follows.

E[b- b]=0

Here, b implies the population parameter, and b implies the predicted parameter value.

Step#8: Realising diagnostic regression model

Prior to the regression model deployment, you need to ensure that your model is validating all of the five assumptions you made at the beginning.

Following the assumption validation, your model needs to undergo F-test and Test to assess the model benefits and individual variable importance, respectively.

Step#9: Model confirmation and deployment

You need to run your model through two types of the dataset; training and validating. If you get an adequately good result for both of the datasets, then your model gets validated.

On the contrary, if one model of one of the data sets provides excellent performance and the other one fails to meet even the average expectation, then it is the incident of over-fitting, which you need to avoid at any cost. However, you need to perform such validation for different datasets to cross-check the model effectiveness. The most popular techniques for cross-validations are,

Root means square identification.
Adjusted square method application, etc.

Once your model passes the validation check, then deploy the model to your original business scenario.

Now let’s have a look at how to program an MLR.

Programming an MLR:

To program any MLR, you need to follow the below steps.

# importing of the libraries
# importing of the dataset
# encoding categorical data
# Feature and objective variable identification
# Divide the data into the training and test set
# Multiple linear regression model training on the training set
#Estimation of test result
#Training set outcomes visualisation
#Test set outcomes visualisation
# Assessment of the performance

generic example of a Split-free MLR

# importing of the libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

import seaborn as sns

# importing of the dataset

df=pd.read_csv(‘File name.csv’)

df.head()

df.info()

RangeIndex: 10 entries, 0 to 9

Data columns (total 3 columns):

# Column Non-Null Count Dtype

— —— ————– —–

0 ABC 10 non-null int64

1 XYZ 10 non-null int64

2 DEF 10 non-null int64

dtypes: int64(3)

memory usage: 368.0 bytes

# Data visualisation (3D plot)

import plotly.express as px

#df=PC.data.iris()

fig = px.scatter_3d(df, x=’ABC’, y=’XYZ’,z=’DEF’)

fig.show()

#Here, ‘ABC’, ‘XYZ’, and ‘DEF’ are the elements of the file.

Where can you learn more about MLR?

To learn more about the MLR, you can join the?data science certification courses of Learnbay.

At Learnbay, you’ll get customised data science course modules as per your years of working experience and domain knowledge.

Presently, we are offering three different IBM certified Masters programs on Artificial Intelligence and Machine learning. These courses are equipped with the most competent data science learning modules and domain specifics real-time industrial projects.

IBM data science professional certification? courses are equipped with highly market competent learning modules that cover every trendy and market demanding aspect of data science like Python and R programming, NLP, Deep Learning, Machine learning course, Computer Vision,Artificial intelligence certification course, data science and ai course etc Their Instructors are professionals in the field of data science and their supervision makes your learning very effective.

We offer end-to-end data science career guidance. To know which course is the best fit for you, fix an online profile review?here.

要查看或添加评论，请登录

Shanti A的更多文章

AI Innovation: GPT-3 - A Game Changer for Natural Language Processing

2023年3月20日

AI Innovation: GPT-3 - A Game Changer for Natural Language Processing

Artificial Intelligence (AI) is undoubtedly one of the most talked-about technological innovations of the 21st century.…
Data Science in Healthcare– Know The Hidden Scopes.

2022年5月20日

Data Science in Healthcare– Know The Hidden Scopes.

Data Science in Healthcare– Know The Hidden Scopes Data science in Healthcare isn’t something new. It is the most…

2 条评论
AI Automation for HR management

2021年9月6日

AI Automation for HR management

HR specializes in hand-picking the rightly skilled assets (employees) to the company, a HR’s most of the time will be…
Trends in Data Science

2021年9月3日

Trends in Data Science

Data will proceed to change the way we work, communicate with others and pretty much influence every aspect of our…

1 条评论
Know The Best Strategy To Find The Right Data Science Job in Delhi?

2021年9月1日

Know The Best Strategy To Find The Right Data Science Job in Delhi?

Data science careers are buzzing everywhere, and so the data science courses. It's true that data science salaries are…

1 条评论
Predictive and prescriptive analysis in Data science for analyst

2021年8月31日

Predictive and prescriptive analysis in Data science for analyst

Both predictive and prescriptive analytics is a BI tool to analyze the data and their behavior to make predictions and…
Applications of Data Science in Banking and Finance

2021年8月27日

Applications of Data Science in Banking and Finance

The use of Data Science in the Banking and Finance industry has become more than essential. Data Science has become a…

1 条评论
Applications of Zeroth Order Optimization in Deep Learning

2021年8月26日

Applications of Zeroth Order Optimization in Deep Learning

Deep learning typically poses complex, often analytically complicated, optimization problems. The objective function…

2 条评论
Data Science for a Managerial Role

2021年8月25日

Data Science for a Managerial Role

Data science managers usually ought to be successful supervisors, because the best managers make a huge impact on a…
5 CAREER SMASHING BLUNDERS: EVERY NEW DATA SCIENTIST SHOULD AVOID

2021年8月20日

5 CAREER SMASHING BLUNDERS: EVERY NEW DATA SCIENTIST SHOULD AVOID

Why do data scientists fail? Started your first data science job? Congratulations and wish you speedy career growth…

See all articles

AN INTRODUCTION TO MULTIPLE LINEAR REGRESSION IN ML

Shanti A

Data Scientist & Tech Head at Learnbay | Transforming Businesses through Advanced Analytics.

What does MLR mean in machine learning (ML)?

Is MLR conducted (supervised) or unsupervised?

What are the real-life examples of linear regression?/ Under what circumstances can we use linear regression?

Example#1

Example#2

Example#3

What is the mathematical expression of MLR?

How does Simple and Multiple regression differ from each other?

Let’s have a look at the difference.

What are the five assumptions of multiple linear regression?

A 9-steps Guide for building and MLR Framework

领英推荐

Programming an MLR:

generic example of a Split-free MLR

Where can you learn more about MLR?

Shanti A的更多文章

社区洞察

其他会员也浏览了

Where Data Becomes Intelligence!

Balancing Act: The Pros and Cons of Machine Learning Algorithms

Deep Dive: Linear Regression

Navigating the Algorithmic Landscape(Linear Regression): Quick reference for development teams and Researchers...

House Price Prediction

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

Machine Learning Across Industries: Transforming the Future with Intelligent Algorithms

ML Day 16: Real-World Project Example Using ML

ML Day 16: Real-World Project Examples Using ML life cycle process steps

Decision Tree

What does MLR mean in machine learning (ML)?

Is MLR conducted (supervised) or unsupervised?

What are the real-life examples of linear regression?/ Under what circumstances can we use linear regression?

Example#1

Example#2

Example#3

What is the mathematical expression of MLR?

How does Simple and Multiple regression differ from each other?

Let’s have a look at the difference.

What are the five assumptions of multiple linear regression?

A 9-steps Guide for building and MLR Framework

领英推荐

Programming an MLR:

generic example of a Split-free MLR

Where can you learn more about MLR?

Shanti A的更多文章

AI Innovation: GPT-3 - A Game Changer for Natural Language Processing

Data Science in Healthcare– Know The Hidden Scopes.

AI Automation for HR management

Trends in Data Science

Know The Best Strategy To Find The Right Data Science Job in Delhi?

Predictive and prescriptive analysis in Data science for analyst

Applications of Data Science in Banking and Finance

Applications of Zeroth Order Optimization in Deep Learning

Data Science for a Managerial Role

5 CAREER SMASHING BLUNDERS: EVERY NEW DATA SCIENTIST SHOULD AVOID

社区洞察

其他会员也浏览了

Where Data Becomes Intelligence!

Balancing Act: The Pros and Cons of Machine Learning Algorithms

Deep Dive: Linear Regression

Navigating the Algorithmic Landscape(Linear Regression): Quick reference for development teams and Researchers...

House Price Prediction

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

Machine Learning Across Industries: Transforming the Future with Intelligent Algorithms

ML Day 16: Real-World Project Example Using ML

ML Day 16: Real-World Project Examples Using ML life cycle process steps

Decision Tree