登录查看更多内容

House Price Prediction using Simple Linear Regression

Mahaveer Sahuu

Tech Advisor for GenAI & AutoML @IQVIA | M.Tech in Data Science & Machine Learning | Artificial Engineer | GenAI Engineer | Prompt Engineer | LLMs | Data Science Trainer | Career Counsellor

发布日期: 2022年1月31日

+ 关注

1. Introduction

1.1 Background

There are many people who are planning to buy their dream house. These people have preferences how their house will look like and what it contains. So, who is not having good knowledge of houses faces difficulty to find good houses with good price Therefore, through this project the task for selecting a house made easy.

1.2 Problem

The problem is to predict price to select a good house. For example, the person may select a house that doesn’t worth it, then we can select house based on some attributes according to our preferences and predict its price. Considering such challenges there is a requirement for a model which can provide with the right option for selecting a house with good rate.

2. Data

The data used in this project is taken from Kaggle site which is a train and test data used for our model building to predict price of the house.

2.1. Data Understanding

There are total 80 variables in train data in which 79 are features or independent variables and one SalePrice is a target or dependent variable. Variables and their data type is shown as:

Fig: Numerical (left) & Categorical(right) variable

2.2. Exploratory Data Analysis

In this part the proper analysis of the type and origin of data is performed. The relationship between variable is explored. The relationship between the attributes is examined via the frequency. We see the summary of data. And dropped some variable which are not appropriate.

2.3. Data Preparation

- Here we are dealing with the null or missing values in the data, If the missing values is more than 80% we dropped that variable and if it is less than 80% we will do the imputation on them. We can see this through heatmap which is shown as:

Fig: Before removing null(left) & After removing null(right)

领英推荐

The Power of Probabilistic Scenarios in Constantly…

International Standard for Lean Six Sigma (ISLSS) 1 年前

Simple Linear Regression in Statistics using Least…

Lean Manufacturing & Six Sigma Worldwide 9 个月前

Multi-Curve Regression Analysis

Alireza Soroudi, PhD 1 年前

- After that we deal with the outliers and remove the extreme values through inter quartile range.

- Then, we do the preparation of data for modelling in which we scale the numerical data and target variable to bring it in a scalable range and do the dummy encoding on the categorical variable and split the data in 80:20 ratio.

3. Model

Here we use basic linear regression algorithm on our train data to predict price of house.

- First we built a simple linear regression model,

- Second we built model on selecting features based on p-value,

- Third we built model using stochastic gradient descent

- And finally we built model on features which is selected by recursive feature selection technique.

4. Result

Now, we predict the price of test data and calculate their rmse score so decide which is good model as for now. Here is the table of model with their rmse score:

5. Conclusion

As we can see from above table that using recursive feature selection technique we get some good results. Further we can improve our model by applying some regularization technique like ridge, lasso, elastic net. We can also improve this by using some more feature engineering to select appropriate features and finally we can also use some advance regression techniques like random forest, gradient boost, XG boost technique to improve or model.

6. Reference

House Price Dataset

要查看或添加评论，请登录

Mahaveer Sahuu的更多文章

Titanic Survival Prediction

2022年2月12日

Titanic Survival Prediction

Github 1. Introduction 1.
Choosing location for building parks in neighbourhood through Toronto data

2019年10月13日

Choosing location for building parks in neighbourhood through Toronto data

Github Link 1. Introduction 1.

House Price Prediction using Simple Linear Regression

Mahaveer Sahuu

Tech Advisor for GenAI & AutoML @IQVIA | M.Tech in Data Science & Machine Learning | Artificial Engineer | GenAI Engineer | Prompt Engineer | LLMs | Data Science Trainer | Career Counsellor

领英推荐

Mahaveer Sahuu的更多文章

社区洞察

其他会员也浏览了

Linear Regression(mostly asked questions) #manralai_top30

Checking for the Assumptions of Linear Regression using the mtcars dataset ????

Evaluation of logistic regression model ( Must read for all )

Overfitting in Regression Models

Multicollinearity in Linear Regression

Analyst must Know these Regression Techniques

Regularization in Regression: A Simple Guide to Lasso and Ridge

Linear Regression and Its Application in Credit Analysis and Financial Data Analytics

How logistic regression can save the day?

Proportions as Dependent Variable in Regression–Which Type of Model?

领英推荐

Mahaveer Sahuu的更多文章

Titanic Survival Prediction

Choosing location for building parks in neighbourhood through Toronto data

社区洞察

其他会员也浏览了

Linear Regression(mostly asked questions) #manralai_top30

Checking for the Assumptions of Linear Regression using the mtcars dataset ????

Evaluation of logistic regression model ( Must read for all )

Overfitting in Regression Models

Multicollinearity in Linear Regression

Analyst must Know these Regression Techniques

Regularization in Regression: A Simple Guide to Lasso and Ridge

Linear Regression and Its Application in Credit Analysis and Financial Data Analytics

How logistic regression can save the day?

Proportions as Dependent Variable in Regression–Which Type of Model?