Introduction of Linear Regression - Machine Learning Model for Industry Use Cases ( both EPC and Process Industries)

Introduction of Linear Regression - Machine Learning Model for Industry Use Cases ( both EPC and Process Industries)

Hello everyone,

Background of the Article:

While I was exploring Machine Learning and mathematical models for industry use cases. I observed that there is a lot traction of in-process industries, especially in the area of Predictive Maintenance. However, which are the possible use cases of the Data-intensive and knowledge-intensive EPC industry?

To find the answers;?I have explored Linear Regression as a base for the mathematical model. Thru this article,

I am trying to dwell into:

  1. Introduction
  2. Use cases for Both EPC and Process Industries
  3. Prerequisites of selecting the data
  4. Business Scale ( Jupyter Notebook, Code)
  5. Observations and conclusion?

Please refer to my earlier article “ Data Science and Process Associated Industry (EPC & Engg Consulting industries)" ”, where we have talked about data science use cases for the industry. This article takes to study a little deeper.


1. INTRODUCTION

What are Linear Regression and its practical use case?

In?statistics,?linear regression?is a?linear?approach for modelling the relationship between a?scalar?response and one or more explanatory variables (also known as?dependent and independent variables). The case of one explanatory variable is called?simple linear regression; for more than one, the process is called?multiple linear regression

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

  1. If the goal is?prediction,?forecasting, or error reduction,[clarification needed]?linear regression can be used to fit a predictive model to an observed?data set?of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.
  2. If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response.

Additional reference: (https://www.statisticshowto.com/probability-and-statistics/regression-analysis/find-a-linear-regression-equation/)

No alt text provided for this image

So how do we build simple linear regression machine learning model ;

In a simple terms ;?We study the existing variables and they underlying mean, median and standard deviation values; then transform the unseen data with study and predict the target variables. Finally, evaluate our model efficiency.?

No alt text provided for this image
No alt text provided for this image

Specific Use cases for Process Industries:?

No alt text provided for this image
No alt text provided for this image

Specific Use cases for EPCs

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Beware: before we apply Linear Regression following should be checked:?

  • The variables should be measured at a continuous level. Examples of continuous variables are time, sales, weight.
  • Linear relationship between dependent and independent variables.
  • The observations should be independent of each other (that is, there should be no dependency).
  • Data should have no significant outliers.
  • Check for homoscedasticity — a statistical concept in which the variances along the best-fit linear-regression line remain similar all through that line.
  • The residuals (errors) of the best-fit regression line follow a normal distribution.

Source : https://www.ibm.com/in-en/analytics/learn/linear-regression

With these basics cleared let’s build a Linear Regression model for a similar data set and check the feasibility and get to know the hands-on implementation

Business case:?

Impact spending on adds TV, radio, and newspaper on sales; this is a small dataset (200 rows and 4 columns). which will be used to find the correlation between spending on sales and a linear developing simple regression model that can be used to predict the future data.

X = Independent Variables , Y = Dependent Variables ; X1 = Spending on TV, X2 = Spending on radio X3= Spending on newspapers, Y = Sales?


Programming Language : PYTHON 3.0

Libraries : PANDAS, NUMPY AND SKLEARN

Visualisations : MATPLOTLIB, SEABORN

IDE : JUPYTER: NOTEBOOK

Credit: The data set and LR model was introduced by INSAID as part of their curriculum, as the data set has relevance to EPC, I have chosen to use the same for my study.??

Building the Machine Learning Model :

  1. Exploratory Data Analysis is carried out on the Data set
  2. Collinearity between the independent variables is checked
  3. Dataset is split into Training and Testing dataset
  4. Data processing by scaling and Fit (), Transform actions

  • Fit () - This is studying the data and calculation std deviation, mean for same data points
  • Transform () - this when applied, takes fit calculation and applies to unseen data takes which can be used to predict the values of the independent variable of unseen data

  1. Building the model by predicting values of unseen data
  2. Evaluating the model ( Mean Error methods)

A more simple definition of study and transformation of seen and unseen data: https://www.analyticsvidhya.com/blog/2021/04/sklearn-objects-fit-vs-transform-vs-fit_transform-vs-predict-in-scikit-learn/

Data Visualization as part of Model Development

No alt text provided for this image

Observations:

  • Independent And Dependent Variables Have Linear Relationships
  • Independent variables does not show multicollinearity
  • TV has the highest correlation with sales followed by radio, the newspaper does not show a favorable correlation

Points to note:

  1. ?Standard data split ratio for train and test data is considered (80/20)
  2. Hyper parameters are kept as default
  3. Random value is kept as 1
  4. Model evaluation is done using standard residual error techniques (mean squared error, root mean square error)

Idea is not to develop a robust model, but to give an idea to the readers how the overall process and the model ca look

Results :?

The developed model could successfully predict the Target or Dependent value on the unseen test data with a model efficiency of > 80%.?

No alt text provided for this image

Sample Feature Engineering is done by dropping the newspaper variable which gave a slight improvement in the overall performance of the model

No alt text provided for this image
KEY TAKE AWAY FROM THE EXERCISE

Machine learning model that can be developed for business benefits

There are already enough use cases and business cases that have been researched special in the area of Procurement, Construction, and Huge implementation that are ongoing for plant operators.?

EPC use cases need isolated and evaluated with a focus on Engineering Procurement and construction. Once the ML models are developed feature engineering can be applied to remove ineffective parameters and get great insights.

Considering the Digital Transformation focus by owner-operators, Contractors, and consultants being in proximity of customers, we are in a position to leverage this opportunity.?

Process Industry and Owner-operators and making progress; the big question is; are we (EPC companies) ready?


References:

https://www.statisticshowto.com/probability-and-statistics/regression-analysis/find-a-linear-regression-equation/

https://en.wikipedia.org/wiki/Linear_regression

https://www.javatpoint.com/linear-regression-in-machine-learning

https://datascienceplus.com/linear-regression-predict-energy-output-power-plant/??

https://risk-engineering.org/notebook/regression-CCPP.html

https://ieeexplore.ieee.org/abstract/document/8540602

https://www.mdpi.com/2227-9717/5/2/28#cite

https://vitalflux.com/procurement-advanced-analytics-use-cases/#Spend_pricing_analytics_use_cases

https://rua.ua.es/dspace/bitstream/10045/66350/1/tesis_miguel_angel_guerrero_lazaro.pdf

https://www.scirp.org/html/1-1880140_36190.htm

https://www.ibm.com/in-en/analytics/learn/linear-regression

https://www.analyticsvidhya.com/blog/2021/04/sklearn-objects-fit-vs-transform-vs-fit_transform-vs-predict-in-scikit-learn/






Naveen Sharma

Construction- Systems Implementation & Integration

3 年

Any model needs quality data...rest is understood

Mohanraj Kedige

Certified Independent Director with IICA with certification for ESG and digitalisation. Published Writer.

3 年

One issue with EPC projects is that they are time bound. Use of ML and mathematical models should be applied in bid stage to manage risks associated with metal costs. Other data would be relevant for projects which are using same technology.

Mohanraj Kedige

Certified Independent Director with IICA with certification for ESG and digitalisation. Published Writer.

3 年

Any suggestions for maths books to brush up ?

Dr. Tushar Tamhane

Manager | Energy Transition & Decarbonization Expert | Ph.D. in Chemical Engineering | Driving Sustainable Solutions | Ex-thyssenkrupp

3 年

Very well summed up Sameer Shirur! I'm resharing this.

要查看或添加评论,请登录

Sameer Shirur的更多文章

社区洞察

其他会员也浏览了