登录查看更多内容

Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Hamza Nadeem

AI Enthusiastic | Artificial Intelligence

发布日期: 2024年3月21日

+ 关注

Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Model Performance depends on its Data.

If Data is not Good, output will be garbage.

That's why

Dr. Sheraz Sb Says:

Garbage In, Garbage Out.

From the previous lesson, we have seen that the MSE value is too High.

So, what to do now???

We have to further analyze the data means data pre-processing.

Data cleaning is the part of data pre processing.

We can apply different models in this scenario.

MSE can be reduced but cann't be completely zero.

In this case, you can apply other models as well like Decision Tree etc.

BIG VALUES DOMINATE IN MODEL, Model Give weightage to Big Values.

{import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

import sklearn

import seaborn as sns

import warnings

warnings.filterwarnings('ignore')

plt.rcParams["figure.figsize"] = [10,5]

# Ignore warnings

import warnings

# Set the warning filter to ignore FutureWarning

warnings.simplefilter(action = "ignore", category = FutureWarning)}

{full_data = pd.read_csv('/content/USA_Housing.csv')}

{# Data shape

print('train data:',full_data.shape)}

{# View first few rows

full_data.head(5)}

{# Data Info

full_data.info()}

{# Heatmap

sns.heatmap(full_data.isnull(),yticklabels = False, cbar = False,cmap = 'tab20c_r')

plt.title('Missing Data: Training Set')

plt.show()}

{# Remove Address feature

full_data.drop('Address', axis = 1, inplace = True)}

{# Remove rows with missing data

full_data.dropna(inplace = True)}

{full_data}

{# Numeric summary

full_data.describe()}

{# Shape of train data

full_data.shape}

{# Split data to be used in the models

# Create matrix of features

x = full_data.drop('Price', axis = 1) # grabs everything else but 'Price'

# Create target variable

y = full_data['Price'] # y is the column we're trying to predict

}

{from sklearn import preprocessing

pre_process = preprocessing.StandardScaler().fit(x)

x_transform = pre_process.fit_transform(x)}

{# x Represents the Features

x_transform.shape

x_transform}

{y.shape}

{# Use x and y variables to split the training data into train and test set

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x_transform, y, test_size = .10, random_state = 101)}

{# Fit

# Import model

from sklearn.linear_model import LinearRegression

from sklearn.pipeline import make_pipeline

from sklearn.preprocessing import StandardScaler

# Create instance of model

lin_reg = LinearRegression()

# Pass training data into model

lin_reg.fit(x_train, y_train)

# pipe = make_pipeline(StandardScaler(), LinearRegression())

# pipe.fit(x_train, y_train)}

{# Predict

y_pred = lin_reg.predict(x_test)

print(y_pred.shape)

print(y_pred)}

{sns.scatterplot(x=y_test, y=y_pred, color='blue', label='Actual Data points')

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', label='Ideal Line')

plt.legend()

plt.show()}

{# Combine actual and predicted values side by side

results = np.column_stack((y_test, y_pred))

领英推荐

The curse and cure of dimensionality

Digitate 1 年前

When the Quick Fix Goes Wrong: The Dark Side of Auto-ML

Walter Shields 1 年前

K-nearest neighbor Classification(KNN)

Bluechip Technologies Asia 6 个月前

# Printing the results

print("Actual Values | Predicted Values")

print("-----------------------------")

for actual, predicted in results:

print(f"{actual:14.2f} | {predicted:12.2f}")}

{residual = actual- y_pred.reshape(-1)

print(residual)}

{# Distribution plot for Residual (difference between actual and predicted values)

sns.distplot(residual, kde=True)}

{# Score It

from sklearn.metrics import mean_squared_error

print('Linear Regression Model')

# Results

print('--'*30)

# mean_squared_error(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)

rmse = np.sqrt(mse)

# Print evaluation metrics

print("Mean Squared Error:", mse)

print("Root Mean Squared Error:", rmse)}

{s = 10100187858 - 9839952411

print(s)}

{y_train.shape}

{from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

rf_regressor = DecisionTreeRegressor()

rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

DTr = mean_squared_error(y_pred_rf,y_test)

#Random Forest Regression Accuracy with test set

print('Decision Tree Regression : ',DTr)}

{from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

rf_regressor = RandomForestRegressor()

rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

RFr = mean_squared_error(y_pred_rf,y_test)

#Random Forest Regression Accuracy with test set

print('Random Forest Regression : ',RFr)}

{from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

from sklearn.ensemble import GradientBoostingRegressor

rf_regressor = GradientBoostingRegressor()

rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

#Random Forest Regression Accuracy with test set

GBr = mean_squared_error(y_pred_rf,y_test)

print('Gradient Boosting Regression : ',GBr)}

[# Sample model scores (replace these with your actual model scores)

model_scores = {

"Linear Regression": 9839952411.801708,

"Descison Tree": 29698988724.82603,

"Random Forest":14315329749.65445,

"Gradient Boosting": 12029643835.717766

}

# Sort the model scores in ascending order based on their values (lower values first)

sorted_scores = sorted(model_scores.items(), key=lambda x: x[1])

# Display the ranking of the models

print("Model Rankings (lower values are better):")

for rank, (model_name, score) in enumerate(sorted_scores, start=1):

print(f"{rank}. {model_name}: {score}")

]

Data Transformation for Structured Data:

1- z-score Scaling or Standard Scaling

2- Min-Max Scaling

Ranges are usually in the range of 0 to 1.

Different Regression Techniques:

Decision Tree Regressor:

It involves partitioning the data into subsets based on the values of independent variables and predicting the target variable of each subset.

Random Forest Regressor:

Random Forest is learning method that creates multiple decision trees during training and outputs the average prediction (for regression) from all individual trees.

Grdient boosting Regressor:

Gradient Boosting is learning technique that builds multiple decision trees sequentially, each one correcting the errors of its predecessor

Be a Good Practionaire.

It's all about Curiosity.

Learn & Explore Things from your side.

CS field is Hands-On Knowledge.

You have to train your mind.

Standard Scaling involves Mean & S.D

Google Colab Link:

https://colab.research.google.com/drive/1KrPiJ9yiRzVAEAeMemgCqZgjpr6S1yp3#scrollTo=fL8XebaeYNvi

#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #hamzanadeem

要查看或添加评论，请登录

查看全部

Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Hamza Nadeem

AI Enthusiastic | Artificial Intelligence

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Mishandling Missing Values @ DS ML models

Isolation Forest: Unmasking Anomalies in Your Data

Bias Variance Tradeoff

Detecting Data Distortions: The Three Types of Biases every Manager and Data Scientist should know

Graphs are perfect for data exploration

Decoding Classification Algorithms: A Fun Guide to Finding Your Data's Perfect Match!

Model Dimensionality and Overfitting

7 Techniques for Encoding Categorical Data: A Comprehensive Guide

Overcoming the Curse of Dimensionality: Techniques and Strategies

领英推荐

ARTIFICIAL NEURAL NETWORK Notes from the AI Advance course-Class 25 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年5月29日

Basics of NumPy

2024年5月16日

DEEP LEARNING Notes from the AI Advance course-Class 24 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年5月14日

Class 35 - CLASSIFICATION MODEL USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月23日

Class 34 - REGRESSION USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月22日

Class 33 - INTRODUCTION TO LLAMA INDEX Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月20日

Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月18日

Class 31 - DOCUMENT GPT HANDS-ON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月9日

Class 30 - CHATBOT FOR DOCUMENTS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月8日

Class 29 - CHATBOT DEBUGGING IN VS CODE Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月4日

社区洞察

其他会员也浏览了

Mishandling Missing Values @ DS ML models

Isolation Forest: Unmasking Anomalies in Your Data

Bias Variance Tradeoff

Detecting Data Distortions: The Three Types of Biases every Manager and Data Scientist should know

Graphs are perfect for data exploration

Decoding Classification Algorithms: A Fun Guide to Finding Your Data's Perfect Match!

Model Dimensionality and Overfitting

7 Techniques for Encoding Categorical Data: A Comprehensive Guide

Overcoming the Curse of Dimensionality: Techniques and Strategies