登录查看更多内容

Linear Regression in Machine Learning

Vanshika Munshi

HR Manager

发布日期: 2023年2月8日

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such as?sales, salary, age, product price,?etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.

Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis, then such a relationship is called a negative linear relationship.

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the error between predicted values and actual values should be minimized. The best fit line will have the least error.

The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression, so we need to calculate the best values for a0?and a1?to find the best fit line, so to calculate this we use cost function.

Cost function-

The different values for weights or coefficient of lines (a0, a1) gives the different line of regression, and the cost function is used to estimate the values of the coefficient for the best fit line.
Cost function optimizes the regression coefficients or weights. It measures how a linear regression model is performing.
We can use the cost function to find the accuracy of the?mapping function, which maps the input variable to the output variable. This mapping function is also known as?Hypothesis function.

For Linear Regression, we use the?Mean Squared Error (MSE)?cost function, which is the average of squared error occurred between the predicted values and actual values. It can be written as:

For the above linear equation, MSE can be calculated as:

领英推荐

Regression in machine learning: Proper classification…

Doug Rose 2 个月前

Converting Regression Problems into Classification…

Gundala Nagaraju (Raju) 8 个月前

Linear Regression is one of the most widely used…

Rashmi Priya 8 个月前

Where,

N=Total number of observation

Yi = Actual value

(a1xi+a0)= Predicted value.

Residuals:?The distance between the actual value and predicted values is called residual. If the observed points are far from the regression line, then the residual will be high, and so cost function will high. If the scatter points are close to the regression line, then the residual will be small and hence the cost function.

Gradient Descent:

Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.
A regression model uses gradient descent to update the coefficients of the line by reducing the cost function.
It is done by a random selection of values of coefficient and then iteratively update the values to reach the minimum cost function.

Model Performance:

The Goodness of fit determines how the line of regression fits the set of observations. The process of finding the best model out of various models is called?optimization. It can be achieved by below method:

1. R-squared method:

R-squared is a statistical method that determines the goodness of fit.
It measures the strength of the relationship between the dependent and independent variables on a scale of 0-100%.
The high value of R-square determines the less difference between the predicted values and actual values and hence represents a good model.
It is also called a?coefficient of determination,?or?coefficient of multiple determination?for multiple regression.
It can be calculated from the below formula:

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are some formal checks while building a Linear Regression model, which ensures to get the best possible result from the given dataset.

Linear relationship between the features and target:
Linear regression assumes the linear relationship between the dependent and independent variables.
Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due to multicollinearity, it may difficult to find the true relationship between the predictors and target variables. Or we can say, it is difficult to determine which predictor variable is affecting the target variable and which is not. So, the model assumes either little or no multicollinearity between the features or independent variables.
Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values of independent variables. With homoscedasticity, there should be no clear pattern distribution of data in the scatter plot.
Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal distribution pattern. If error terms are not normally distributed, then confidence intervals will become either too wide or too narrow, which may cause difficulties in finding coefficients.
It can be checked using the?q-q plot. If the plot shows a straight line without any deviation, which means the error is normally distributed.
No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there will be any correlation in the error term, then it will drastically reduce the accuracy of the model. Autocorrelation usually occurs if there is a dependency between residual errors.

要查看或添加评论，请登录

Vanshika Munshi的更多文章

Key Data Engineer Skills and Responsibilities

2024年8月13日

Key Data Engineer Skills and Responsibilities

Over time, there has been a significant transformation in the realm of data and its associated domains. Initially, the…
What Is Financial Planning? Definition, Meaning and Purpose

2024年8月12日

What Is Financial Planning? Definition, Meaning and Purpose

Financial planning is the process of taking a comprehensive look at your financial situation and building a specific…
What is Power BI?

2024年8月10日

What is Power BI?

The parts of Power BI Power BI consists of several elements that all work together, starting with these three basics: A…
Abinitio Graphs

2024年8月8日

Abinitio Graphs

Graph Concept Graph : A graph is a data flow diagram that defines the various processing stages of a task and the…
Abinitio Interview Questions

2024年8月6日

Abinitio Interview Questions

1. What is Ab Initio? Ab Initio is a robust data processing and analysis tool used for ETL (Extract, Transform, Load)…
Big Query

2024年8月5日

Big Query

BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of…
Responsibilities of Abinitio Developer

2024年8月3日

Responsibilities of Abinitio Developer

Job Description Project Role : Application Developer Project Role Description : Design, build and configure…
Abinitio Developer

2024年8月2日

Abinitio Developer

Responsibilities Monitor and Support existing production data pipelines developed in AB Initio Analysis of highly…
Data Engineer

2024年8月1日

Data Engineer

Data engineering is the practice of designing and building systems for collecting, storing, and analysing data at…
Pyspark

2024年7月31日

Pyspark

What is PySpark? Apache Spark is written in Scala programming language. PySpark has been released in order to support…

See all articles

Linear Regression in Machine Learning

Vanshika Munshi

HR Manager

Finding the best fit line:

Cost function-

领英推荐

Gradient Descent:

Model Performance:

Assumptions of Linear Regression

Vanshika Munshi的更多文章

社区洞察

其他会员也浏览了

The Power of Linear Regression in Machine Learning

ML Day 8: Basic ML Algorithms Every IT Professional Should Know

Hypothesis Testing in Machine Learning

AI_Part_3_Regression vs Classification Models

Common Machine Learning Algorithms

REGRESSION TECHNIQUES IN MACHINE LEARNING

What Is Logistic Regression in Machine Learning?

Evaluation Metrics in Machine Learning: How to Measure Model Performance

Maximum Likelihood Estimation in Machine Learning.

Linear Regression for Machine Learning

Finding the best fit line:

Cost function-

领英推荐

Gradient Descent:

Model Performance:

Assumptions of Linear Regression

Vanshika Munshi的更多文章

Key Data Engineer Skills and Responsibilities

What Is Financial Planning? Definition, Meaning and Purpose

What is Power BI?

Abinitio Graphs

Abinitio Interview Questions

Big Query

Responsibilities of Abinitio Developer

Abinitio Developer

Data Engineer

Pyspark

社区洞察

其他会员也浏览了

The Power of Linear Regression in Machine Learning

ML Day 8: Basic ML Algorithms Every IT Professional Should Know

Hypothesis Testing in Machine Learning

AI_Part_3_Regression vs Classification Models

Common Machine Learning Algorithms

REGRESSION TECHNIQUES IN MACHINE LEARNING

What Is Logistic Regression in Machine Learning?

Evaluation Metrics in Machine Learning: How to Measure Model Performance

Maximum Likelihood Estimation in Machine Learning.

Linear Regression for Machine Learning