Sample Housing Market Problem

Sample Housing Market Problem

(Courtesy: M Yasser H of Kaggle (https://www.kaggle.com/yasserh))

Description:

A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?

Acknowledgement:

Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102.

Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.

Objective:

  • Understand the Dataset & cleanup (if required).
  • Build Regression models to predict the sales w.r.t a single & multiple feature.
  • Also evaluate the models & compare thier respective scores like R2, RMSE, etc.

Housing Jittered Scatter Plot:

No alt text provided for this image

For this dataset given, I wanted to observe its behavior by jittering a scatter plot above as well as creating a scatter plot below.

House Correlation Circle Plot:

No alt text provided for this image


Dotplot Comparing House Algorithms:

No alt text provided for this image

The models that I felt that could get the best possible predictions are as follows:

  • Original Linear Model
  • General Linear Model
  • Partial Least Squares Regression
  • Cubist
  • Random Forest

I've tested each of those models and monitored them by three measures:

  • Mean Absolute Error (MAE)
  • Root Mean Square Error (RMSE)
  • R-Squared


Conclusion:

Overall, based on the dotplot above, there are three models in my opinion that can make the best prediction of where the housing trends will go with minimal error. If you would like to know more about these models in detail, please visit my GitHub site at www.github.com/pc1991/Housing. Thank you very much for your time and consideration. I hope to see you again next time.

要查看或添加评论,请登录

Robert Paul的更多文章

  • The Blowout Brush Bananza

    The Blowout Brush Bananza

    Ladies and gentlemen, I anticipate what you are probably thinking when you start reading this article, "Why is…

  • Playing Around With Deep Learning: The Iceland Version

    Playing Around With Deep Learning: The Iceland Version

    I hope everyone is having a happy weekend so far preparing for Football Sunday: The FIFA World Cup Final and Week 15 of…

  • Picking A Model To Predict Future House Prices in the US

    Picking A Model To Predict Future House Prices in the US

    Ladies and gentlemen, I have been well aware on the alleged great migration within the United States of America. I've…

  • Can FTX Recover?

    Can FTX Recover?

    Ladies and gentlemen, I took the time to run some machine learning models of the dataset of the FTT coin from the last…

  • Digging deep into the collapse of FTX

    Digging deep into the collapse of FTX

    Ladies and gentlemen, I want to write a quick excerpt based on the findings that I have found numbers wise. Below is a…

  • Ego + Hubris = Denial = Arrogance

    Ego + Hubris = Denial = Arrogance

    Ladies and gentlemen, if you were not aware of the cryptocurrency news by now, FTX, the company that handles and…

  • Wine Quality Dataset

    Wine Quality Dataset

    (Courtesy: M Yasser H of Kaggle (https://www.kaggle.

  • AMC Stock Data & Its History

    AMC Stock Data & Its History

    (Courtesy: Arpit Verma of Kaggle) https://www.kaggle.

  • Online Shoppers Purchasing Intention Dataset

    Online Shoppers Purchasing Intention Dataset

    (Courtesy: Akash Patel of Kaggle) Ladies and gentlemen, I did a thorough analysis on a dataset of multiple vectors…

社区洞察

其他会员也浏览了