Wine Quality Dataset
(Courtesy: M Yasser H of Kaggle (https://www.kaggle.com/yasserh))
Description:
This datasets is related to red variants of the Portuguese "Vinho Verde" wine.The dataset describes the amount of various chemicals present in wine and their effect on it's quality. The datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).Your task is to predict the quality of wine using the given data.
A simple yet challenging project, to anticipate the quality of wine.
The complexity arises due to the fact that the dataset has fewer samples, & is highly imbalanced.
Can you overcome these obstacles & build a good predictive model to classify them?
This data frame contains the following columns:
Input variables (based on physicochemical tests):
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
领英推荐
10 - sulphates
11 - alcohol
Output variable (based on sensory data):
12 - quality (score between 0 and 10)
Acknowledgements:
This dataset is also available from Kaggle & UCI machine learning repository,?https://archive.ics.uci.edu/ml/datasets/wine+quality.
Objective:
Observations:
In order for me to understand this dataset, I had to take a quick glimpse of the dataset by scrolling through the first couple of rows in the table. I ran a jittered scatter plot as well as a correlation circle plot for that dataset below.
Once that was taken care of, I experimented with some models that would give me the best fit. Those models are listed below:
Conclusion:
Overall, based on the table above, we find out that the RF model has the least MAE & RMSE, as well as the greatest R-Squared metric. For further detail on this model, please visit my website at www.github.com/pc1991/Wine. I am looking forward to seeing you there. Thank you very much for reading. Take care.