登录查看更多内容

Titanic with ML and Evaluation Classification Models

Belal Aboelkher

Machine Learning Nerd |Computer vision and Machine learning Engineer | ROS | Robotics

发布日期: 2022年1月1日

Firstly , it's one?of the projects that I finished when I started the journey with ML and I'll try to upload one by one of my projects and explain with it a Trick or algorithm or special technique

Therefore it hello world project in ML for Titanic Datasets

The process of working on data is divided into some parts

1-Import the dataset

2-Cleaning data and handling the missing data

2-Visualization the distributions

3- check the bias of the target

4-encoding data?can use in the ML process

5-feature extraction

6- splitting Data

7-selecting the classification model

8-evaluate the?meaning of the accuracy of the accurac

Using pandas start to read the data and show the head of it

check the empty values its very important step cause the machine learning models cant accept it and if turn it to values by wrong action the model unfortunately will ignore all data in predictions (underfite)

we have 177 null in age and its note small number with respect to the data
in Cebin we has 687 missing values ,we ca drop all this column cause it will not effect relational and also its 687 missing value of 860 so it's hard to get all of this missing data by any sens technique

Drop the non-useful columns and we deciding it logically and based on the relation between each feature and the the target..for examle the ones called 'ZAZA' dosn't mean that he will die with probability higher than 'ZIZI'

New data after dropping the non-useful columns

Show the distribution of each feature and the relation if it

get the quantity of target values you should care about it cause the increasing and decreasing quantity of any class will make the model bias towards this model which is has the higher quantity data in training
but it here i think it has balanced quantity

you could be convert the columns with type object or has strings values cause the model as you know its an statistical formulas so must converted by some of techniques as one-hot encoder and label-encoder
OneHotEncoder Encodes categorical integer features as a one-hot numeric array. It's Transform method returns a sparse matrix if sparse=True else a 2-d array. You can't cast a 2-d array (or sparse matrix) into a Pandas Series. You must create a Pandas Serie (a column in a Pandas dataFrame) for each category.
get dummies func can give this action easily just path your dataset

and now we dropped the nulls

Backward elimination is a feature selection technique while building a machine learning model. It is used to remove those features that do not have a significant effect on the dependent variable or prediction of output. There are various ways to build a model in Machine Learning, which are:

All-in
Backward Elimination
Forward Selection
Bidirectional Elimination
Score Comparison

领英推荐

Handling Outliers in ML: Best Practices for Robust…

Iain Brown PhD 1 年前

Exploring the Limitations of KMeans and the…

Jason Raper 5 个月前

Decision Tree

Bluechip Technologies Asia 10 个月前

Steps of Backward Elimination

Step-1: Firstly, We need to select a significance level to stay in the model. (SL=0.05)

Step-2: Fit the complete model with all possible predictors/independent variables.

Step-3: Choose the predictor which has the highest P-value, such that.

If P-value >SL, go to step 4.
Else Finish, and Our model is ready.

Step-4: Remove that predictor.

Step-5: Rebuild and fit the model with the remaining variables.

The features selected by Backward Elimination is

The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive modeling problem. Although simple to use and interpret, there are times when the procedure should not be used, such as when you have a small dataset and situations where additional configuration is required, such as when it is used for classification and the dataset is not balanced. In this tutorial, you will discover how to evaluate machine learning models using the train-test split.

Logistic regression is another powerful supervised ML algorithm used for binary classification problems (when target is categorical). The best way to think about logistic regression is that it is a linear regression but for classification problems. Logistic regression essentially uses a logistic function defined below to model a binary output variable (Tolles & Meurer, 2016). The primary difference between linear regression and logistic regression is that logistic regression's range is bounded between 0 and 1. In addition, as opposed to linear regression, logistic regression does not require a linear relationship between inputs and output variables. This is due to applying a nonlinear log transformation to the odds ratio (will be defined shortly).
Logisticfunction=1/1+e?x^x ->sigmoid function

in classification we cant depend on only accuracy it loss some times

But why?

You must be wondering?‘Can’t we just use?accuracy?of the model as the holy grail metric?’
Accuracy is very important, but it might not be the best metric all the time. Let’s look at why with an example -:

Let’s say we are building a model which predicts if a bank loan will default or not

(The S&P/Experian Consumer Credit Default Composite Index reported a default rate of 0.91%)
Let’s have a dummy model that always predicts that a loan will not default. Guess what would be the accuracy of this model? ===> 99.10%

Impressive, right? Well, the probability of a bank buying this model is absolute zero. ??

While our model has a stunning accuracy, this is an apt example where accuracy is definitely not the right metric.

If not accuracy, what else?

Along with accuracy, there are a bunch of other methods to evaluate the performance of a classification model

Confusion?matrix,
Precision, Recall
ROC and AUC

Confusion Matrix

As now we are familiar with TP, TN, FP, FN — It will be very easy to understand what confusion matrix is.
It is a summary table showing how good our model is at predicting examples of various classes. Axes here are predicted-lables vs actual-labels.

Precision and Recall

Precision —?Also called Positive predictive value The ratio of correct positive predictions to the total predicted positives.

Recall —?Also called Sensitivity, Probability of Detection, True Positive Rate The ratio of correct positive predictions to the total positives examples.

Accuracy

Accuracy is defined as the ratio of correctly predicted examples by the total examples.

The END

Mahmoud Abo Elfotouh

Back End Developer

3 年

Great work

1 次回应

Ahmed Mohamed Abdelaziz

Technology development engineer

3 年

Great work ??

1 次回应

Nourhan Elsherbiny

ITI Trainee - Java Enterprise & Web Apps development || ISTQB? CTFL

3 年

Great work and great explanation thank you for sharing this ????

1 次回应

查看更多评论

要查看或添加评论，请登录

Belal Aboelkher的更多文章

The correlation Coefficient of my data is real ?!!

2023年6月9日

The correlation Coefficient of my data is real ?!!

What makes you confident in the percentage of the data correlation !? ?? For example, your code output was 95% of the…
CNN Model Cat and Dog with a Generator from scratch

2022年3月14日

CNN Model Cat and Dog with a Generator from scratch

Creating CNN model to detect the cats and dogs firstly, discussing the point of creating a deep learning model the main…

2 条评论
Linear Regression with TensorFlow from Scratch

2022年2月12日

Linear Regression with TensorFlow from Scratch

A Linear Regression model’s main aim is to find the best fit linear line and the optimal values of intercept and…

2 条评论

Titanic with ML and Evaluation Classification Models

Belal Aboelkher

Machine Learning Nerd |Computer vision and Machine learning Engineer | ROS | Robotics

领英推荐

Steps of Backward Elimination

The features selected by Backward Elimination is

Confusion Matrix

Precision and Recall

Accuracy

The END

Belal Aboelkher的更多文章

社区洞察

其他会员也浏览了

Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Decoding Classification Algorithms: A Fun Guide to Finding Your Data's Perfect Match!

Time Series Decomposition in Machine Learning

Bagging , Random Forest and Adaboost

Day 10 - K-Means Clustering

Not more, get better data!

Day 4: Random Forest

Unlocking Model Performance: Navigating the Key Factors for Success in Machine Learning

L1, L2 Regularization – Why needed/What it does/How it helps?

Class 13 - DATA TRANSFORMATION, SORTING & VISUALIZATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

领英推荐

Steps of Backward Elimination

The features selected by Backward Elimination is

Confusion Matrix

Precision and Recall

Accuracy

The END

Belal Aboelkher的更多文章

The correlation Coefficient of my data is real ?!!

CNN Model Cat and Dog with a Generator from scratch

Linear Regression with TensorFlow from Scratch

社区洞察

其他会员也浏览了

Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Decoding Classification Algorithms: A Fun Guide to Finding Your Data's Perfect Match!

Time Series Decomposition in Machine Learning

Bagging , Random Forest and Adaboost

Day 10 - K-Means Clustering

Not more, get better data!

Day 4: Random Forest

Unlocking Model Performance: Navigating the Key Factors for Success in Machine Learning

L1, L2 Regularization – Why needed/What it does/How it helps?

Class 13 - DATA TRANSFORMATION, SORTING & VISUALIZATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)