Wine Quality Dataset

Wine Quality Dataset

(Courtesy: M Yasser H of Kaggle (https://www.kaggle.com/yasserh))


Description:

This datasets is related to red variants of the Portuguese "Vinho Verde" wine.The dataset describes the amount of various chemicals present in wine and their effect on it's quality. The datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).Your task is to predict the quality of wine using the given data.

A simple yet challenging project, to anticipate the quality of wine.

The complexity arises due to the fact that the dataset has fewer samples, & is highly imbalanced.

Can you overcome these obstacles & build a good predictive model to classify them?

This data frame contains the following columns:

Input variables (based on physicochemical tests):

1 - fixed acidity

2 - volatile acidity

3 - citric acid

4 - residual sugar

5 - chlorides

6 - free sulfur dioxide

7 - total sulfur dioxide

8 - density

9 - pH

10 - sulphates

11 - alcohol

Output variable (based on sensory data):

12 - quality (score between 0 and 10)

Acknowledgements:

This dataset is also available from Kaggle & UCI machine learning repository,?https://archive.ics.uci.edu/ml/datasets/wine+quality.

Objective:

  • Understand the Dataset & cleanup (if required).
  • Build classification models to predict the wine quality.
  • Also fine-tune the hyper parameters & compare the evaluation metrics of various classification algorithms.

Observations:

In order for me to understand this dataset, I had to take a quick glimpse of the dataset by scrolling through the first couple of rows in the table. I ran a jittered scatter plot as well as a correlation circle plot for that dataset below.

No alt text provided for this image
No alt text provided for this image

Once that was taken care of, I experimented with some models that would give me the best fit. Those models are listed below:

  • k-Nearest Neighbors (KNN)
  • Partial Least Squares Regression (PLS)
  • Gaussian Fit Linear Regression (GAUSSIAN)
  • Original Linear Model (LM)
  • Generalized Linear Model (GLM)
  • Elasticnet (ENET)
  • Cubist
  • Random Forest (RF)

No alt text provided for this image

Conclusion:

Overall, based on the table above, we find out that the RF model has the least MAE & RMSE, as well as the greatest R-Squared metric. For further detail on this model, please visit my website at www.github.com/pc1991/Wine. I am looking forward to seeing you there. Thank you very much for reading. Take care.

要查看或添加评论,请登录

Robert Paul的更多文章

  • The Blowout Brush Bananza

    The Blowout Brush Bananza

    Ladies and gentlemen, I anticipate what you are probably thinking when you start reading this article, "Why is…

  • Playing Around With Deep Learning: The Iceland Version

    Playing Around With Deep Learning: The Iceland Version

    I hope everyone is having a happy weekend so far preparing for Football Sunday: The FIFA World Cup Final and Week 15 of…

  • Picking A Model To Predict Future House Prices in the US

    Picking A Model To Predict Future House Prices in the US

    Ladies and gentlemen, I have been well aware on the alleged great migration within the United States of America. I've…

  • Can FTX Recover?

    Can FTX Recover?

    Ladies and gentlemen, I took the time to run some machine learning models of the dataset of the FTT coin from the last…

  • Digging deep into the collapse of FTX

    Digging deep into the collapse of FTX

    Ladies and gentlemen, I want to write a quick excerpt based on the findings that I have found numbers wise. Below is a…

  • Ego + Hubris = Denial = Arrogance

    Ego + Hubris = Denial = Arrogance

    Ladies and gentlemen, if you were not aware of the cryptocurrency news by now, FTX, the company that handles and…

  • Sample Housing Market Problem

    Sample Housing Market Problem

    (Courtesy: M Yasser H of Kaggle (https://www.kaggle.

  • AMC Stock Data & Its History

    AMC Stock Data & Its History

    (Courtesy: Arpit Verma of Kaggle) https://www.kaggle.

  • Online Shoppers Purchasing Intention Dataset

    Online Shoppers Purchasing Intention Dataset

    (Courtesy: Akash Patel of Kaggle) Ladies and gentlemen, I did a thorough analysis on a dataset of multiple vectors…

社区洞察

其他会员也浏览了