Titanic - Machine Learning from Disaster | Kaggle
Titanic - Machine Learning from Disaster | Kaggle

Titanic - Machine Learning from Disaster | Kaggle


Titanic - Machine Learning from Disaster: The Titanic's sinking in 1912 led to 1502 deaths out of 2224 people due to insufficient lifeboats. This challenge asks you to predict which passengers were more likely to survive using data such as age, gender, and class. Steps include data cleaning, EDA, feature engineering, model training, evaluation.

Here's a more detailed breakdown of each step to enhance your Jupyter notebook for the Titanic dataset analysis:

1. Data Understanding

  • Dataset Overview:

  1. Start with an introduction to the Titanic dataset. Explain that it contains information about passengers, including their survival status, age, sex, passenger class, etc.
  2. Provide basic statistics: number of passengers, number of survivors, number of missing values in key columns, etc.
  3. Objective: State that the goal is to predict whether a passenger survived or not using the provided features..

  • Feature Description:

  1. List and describe each feature (e.g., Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, etc.).
  2. Mention which features are categorical, numerical, or ordinal.

2. Data Preprocessing

  • Handling Missing Data:

  1. Identify Missing Values: Use .isnull().sum() to list features with missing values.
  2. Imputation: Describe methods to handle missing values:For numerical features like Age, you could use mean/median imputation.For categorical features like Embarked, use mode imputation.Justify why you chose certain methods over others.

  • Outlier Detection and Treatment:

  1. Detect Outliers: Use box plots or statistical methods (e.g., IQR method) to identify outliers in numerical features like Fare and Age.
  2. Treat Outliers: Decide whether to cap, transform, or remove outliers, depending on their impact on the analysis.

  • Feature Engineering:

  1. Create New Features: Combine or transform existing features to create new ones:FamilySize: Combine SibSp and Parch.IsAlone: Create a binary feature indicating if a passenger is traveling alone.Title: Extract titles from the Name feature (e.g., Mr., Mrs., Miss.) and use them as categorical variables.
  2. Binning: Consider binning continuous variables like Age into categorical bins (e.g., Child, Adult, Senior).

  • Encoding Categorical Variables:

  1. Label Encoding: Apply Label Encoding to ordinal features like Pclass.
  2. One-Hot Encoding: Use One-Hot Encoding for nominal features like Sex, Embarked, and the newly created Title.

3. Exploratory Data Analysis (EDA)

  • Univariate Analysis:

  1. Visualize Individual Features: Use histograms or bar charts to show the distribution of numerical features (e.g., Age, Fare) and count plots for categorical features (Sex, Pclass).
  2. Statistical Summary: Provide summary statistics using .describe() for numerical features.

  • Bivariate Analysis:
  • Survival vs. Features: Analyze how survival rates vary across different features:Sex: Plot survival rate by gender.Pclass: Compare survival rates across different passenger classes.Age: Use a KDE plot or a bar chart to show survival rate by age group.
  • Correlation Analysis: Use a heatmap to visualize the correlation matrix, focusing on how features like Pclass, Fare, and Age correlate with survival.
  • Multivariate Analysis:

4. Model Building

  • Feature Selection:

  1. Correlation with Target: Select features that are strongly correlated with survival.
  2. Recursive Feature Elimination (RFE): Use RFE with a model (like Logistic Regression) to rank features and select the top ones.
  3. Cross-Validation: Use cross-validation techniques (like k-fold) to ensure selected features generalize well.

  • Model Selection:

  1. Baseline Models: Start with simple models like Logistic Regression to set a performance baseline.
  2. Complex Models: Introduce more complex models like Random Forest, XGBoost, or Support Vector Machines (SVM) as potential candidates.
  3. Ensemble Methods: Consider using ensemble methods like bagging or boosting to improve model performance.

5. Model Evaluation

  • Train-Test Split:

  1. Split the data into training and testing sets (e.g., 80-20 split).
  2. Justify the split ratio based on dataset size and potential model complexity.

  • Evaluation Metrics:

  1. Classification Report: Use classification_report to evaluate precision, recall, and F1-score for each class.
  2. Confusion Matrix: Plot and analyze the confusion matrix to understand the model's performance.
  3. ROC Curve and AUC: Plot ROC curves for the models and compare AUC scores.
  4. Cross-Validation: Implement k-fold cross-validation to ensure model robustness.

6. Hyperparameter Tuning

  • Grid Search:

Define a parameter grid for the chosen models.

Use GridSearchCV to search for the best combination of hyperparameters.

Document the selected parameters and justify the choices based on the cross-validated score.

  • Random Search:

  1. If the parameter space is large, use RandomSearchCV for a more efficient search.
  2. Compare the results with Grid Search and explain any differences.

7. Model Interpretation

  • Feature Importance:

  1. If the parameter space is large, use RandomSearchCV for a more efficient search.
  2. Compare the results with Grid Search and explain any differences.

  • Partial Dependence Plots:

  1. Use partial dependence plots to show the effect of a single feature on the predicted outcome, holding other features constant.

  • SHAP Values:

  1. Consider using SHAP (SHapley Additive exPlanations) values to provide a more nuanced interpretation of model predictions.

8. Conclusion and Next Steps

  • Summary:

  1. Summarize the key findings of the analysis, such as which features were most predictive of survival and how the model performed.
  2. Highlight any interesting patterns or insights discovered during EDA (e.g., high survival rates for women and children).

  • Next Steps:

  1. Suggest possible improvements, such as:Gathering more data or external datasets.Trying other advanced machine learning models like Neural Networks.Implementing advanced feature engineering techniques.
  2. Recommend further research areas or business applications.

9. Documentation and Comments

  • Code Comments:

  1. Ensure each line of code or block of code is well-commented to explain what it does and why it's necessary.

  • Markdown Cells:

  1. Use markdown cells extensively to create a narrative that guides the reader through your analysis.Add titles and subtitles to sections for better readability.Include a table of contents at the beginning of the notebook for easy navigation.

This detailed approach will make your notebook not only a strong analytical tool but also a clear and educational resource for others. If you need further assistance with any of these steps, feel free to ask!



import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
/kaggle/input/titanic/train.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/gender_submission.csv
import primary library in python
import pandas as pd  # Data manipulation and analysis
import numpy as np  # Numerical operations
import matplotlib.pyplot as plt  # Data visualization
import seaborn as sns  # High-level data visualization based on matplotlib

from sklearn.impute import SimpleImputer  # Handling missing values
from sklearn.preprocessing import OneHotEncoder  # Encoding categorical features
from sklearn.compose import ColumnTransformer  # Applying transformers to columns
from sklearn.pipeline import Pipeline  # Assembling steps for cross-validation
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier  # Machine learning algorithm for classification
from xgboost import XGBClassifier
from sklearn.model_selection import cross_val_score  # Cross-validation for evaluating scores


pd.set_option('display.max_rows', None)  # Display all rows in pandas DataFrame

from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Models
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB

import warnings
# Ignore all warnings
warnings.filterwarnings('ignore')
Import Titanic dataset:
# Read the CSV files into pandas DataFrames
train_df = pd.read_csv("/kaggle/input/titanic/train.csv")
test_df = pd.read_csv("/kaggle/input/titanic/test.csv")
gender_submission_df = pd.read_csv("/kaggle/input/titanic/gender_submission.csv")
train_df.head(3)
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S
print(train_df.shape)
print(test_df.shape)
(891, 12)
(418, 11)
Statistical Data
print(train_df.describe())
print(test_df.describe())
       PassengerId    Survived      Pclass         Age       SibSp  \
count   891.000000  891.000000  891.000000  714.000000  891.000000   
mean    446.000000    0.383838    2.308642   29.699118    0.523008   
std     257.353842    0.486592    0.836071   14.526497    1.102743   
min       1.000000    0.000000    1.000000    0.420000    0.000000   
25%     223.500000    0.000000    2.000000   20.125000    0.000000   
50%     446.000000    0.000000    3.000000   28.000000    0.000000   
75%     668.500000    1.000000    3.000000   38.000000    1.000000   
max     891.000000    1.000000    3.000000   80.000000    8.000000   

            Parch        Fare  
count  891.000000  891.000000  
mean     0.381594   32.204208  
std      0.806057   49.693429  
min      0.000000    0.000000  
25%      0.000000    7.910400  
50%      0.000000   14.454200  
75%      0.000000   31.000000  
max      6.000000  512.329200  
       PassengerId      Pclass         Age       SibSp       Parch        Fare
count   418.000000  418.000000  332.000000  418.000000  418.000000  417.000000
mean   1100.500000    2.265550   30.272590    0.447368    0.392344   35.627188
std     120.810458    0.841838   14.181209    0.896760    0.981429   55.907576
min     892.000000    1.000000    0.170000    0.000000    0.000000    0.000000
25%     996.250000    1.000000   21.000000    0.000000    0.000000    7.895800
50%    1100.500000    3.000000   27.000000    0.000000    0.000000   14.454200
75%    1204.750000    3.000000   39.000000    1.000000    0.000000   31.500000
max    1309.000000    3.000000   76.000000    8.000000    9.000000  512.329200
print(train_df.info())
print(test_df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  418 non-null    int64  
 1   Pclass       418 non-null    int64  
 2   Name         418 non-null    object 
 3   Sex          418 non-null    object 
 4   Age          332 non-null    float64
 5   SibSp        418 non-null    int64  
 6   Parch        418 non-null    int64  
 7   Ticket       418 non-null    object 
 8   Fare         417 non-null    float64
 9   Cabin        91 non-null     object 
 10  Embarked     418 non-null    object 
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB
None
Exploratory data analysis (EDA)
# Function for Finding missing value
def plot_missing_data(dataset, title):
    fig, ax = plt.subplots(figsize=(5,5))
    plt.title(title)  
    sns.heatmap(dataset.isnull(), cbar=False)
plot_missing_data(train_df, "Training Dataset")

plot_missing_data(test_df, "Test Dataset")

train_df.isnull().sum()
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64
test_df.isnull().sum()
PassengerId      0
Pclass           0
Name             0
Sex              0
Age             86
SibSp            0
Parch            0
Ticket           0
Fare             1
Cabin          327
Embarked         0
dtype: int64
# Make a Function for Barchart to visualized 

def bar_chart_stacked(dataset, feature, stacked=True):
    alive = dataset[dataset['Survived'] == 1][feature].value_counts()
    dead = dataset[dataset['Survived'] == 0][feature].value_counts()
    df_alive_dead = pd.DataFrame([alive, dead])
    df_alive_dead.index = ['Passengers Alive', 'Passengers Died']
    
    ax = df_alive_dead.plot(kind='bar', stacked=stacked, figsize=(8, 5))
    
    # Annotate the bars with the counts for each segment
    for container in ax.containers:
        ax.bar_label(container, label_type='center')
    
    # Calculate and annotate the total count for each bar
    totals = df_alive_dead.sum(axis=1)
    for i, total in enumerate(totals):
        ax.text(i, total + 1, str(total), ha='center', va='bottom', weight='bold')
    
    plt.title(f'Stacked Bar Chart of {feature}')
    plt.xlabel(feature)
    plt.ylabel('Number of Passengers')
    plt.show()
bar_chart_stacked(train_df, 'Sex')

train_df.groupby('Sex').Survived.mean()
Sex
female    0.742038
male      0.188908
Name: Survived, dtype: float64
bar_chart_stacked(train_df, "Survived")

#Analyze Feature Pclass:
bar_chart_stacked(train_df, 'Pclass')

pd.pivot_table(train_df, index='Survived', columns='Pclass', values='PassengerId', aggfunc='count')
Pclass	1	2	3
Survived			
0	80	97	372
1	136	87	119
train_df.groupby(['Pclass']).Survived.mean()
Pclass
1    0.629630
2    0.472826
3    0.242363
Name: Survived, dtype: float64
Observation:
From the plots and tables presented above, it becomes evident that the passenger class (Pclass) is a significant factor to consider when analyzing survival rates. The data indicates a clear correlation between a passenger's class and their likelihood of survival.
Passengers in higher classes (e.g., 1st class) tend to have higher survival rates compared to those in lower classes (e.g., 3rd class).

# Function for Barchart Compare
def bar_compare(dataset, feature1, feature2=None):
    plt.figure(figsize = [5,5])
    g = sns.barplot(x=feature1, y='Survived', hue=feature2, ci=None, data=dataset).set_ylabel('Survival rate')
bar_compare(train_df, "Pclass", "Sex")

pd.pivot_table(train_df, index = 'Survived', columns = ['Pclass', "Sex"], values = 'PassengerId' ,aggfunc ='count')
Pclass	1	2	3
Sex	female	male	female	male	female	male
Survived						
0	3	77	6	91	72	300
1	91	45	70	17	72	47
train_df.groupby(['Pclass']).Survived.mean().to_frame()
Survived
Pclass	
1	0.629630
2	0.472826
3	0.242363
pd.crosstab(train_df['Sex'], train_df['Survived'])
Survived	0	1
Sex		
female	81	233
male	468	109
pd.crosstab(train_df['Pclass'], train_df['Survived'])
Survived	0	1
Pclass		
1	80	136
2	97	87
3	372	119
train_df.groupby(['Pclass', "Sex"]).Survived.mean().to_frame()
Survived
Pclass	Sex	
1	female	0.968085
male	0.368852
2	female	0.921053
male	0.157407
3	female	0.500000
male	0.135447
From the plots and tables above, it becomes clear that the Pclass and Sex is an important factor to consider.
Here Analyze Age, is it importent?
# Bell curve
def plot_distribution(dataset, feature, title, bins = 30, hist = True, fsize = (5,5), fize = (155)):
    fig, ax = plt.subplots(figsize=fsize)
    ax.set_title(title)
    sns.distplot(train_df[feature], color='g', bins=bins, ax=ax)
# Age Distribution Surived vs Died
def plot_kernel_density_estimate_survivors(dataset, feature1, title, fsize = (5,5)):
    fig, ax = plt.subplots(figsize=fsize)
    ax.set_title(title) 
    sns.kdeplot(dataset[feature1].loc[train_df["Survived"] == 1],color='g',
                shade= True, ax=ax, label='Survived').set_xlabel(feature1)
    sns.kdeplot(dataset[feature1].loc[train_df["Survived"] == 0],
                shade=True, ax=ax, label="Died" , color='r')
plot_distribution(train_df, 'Age', "Passengers age")

plot_kernel_density_estimate_survivors(train_df, 'Age', "Passengers age with Survived")

To analyze the features "Age" and "Sex" together and visualize their impact
def swarmplot_survivors(dataset, feature1, feature2, title):
    fig, ax = plt.subplots(figsize=(18,5))
    # Turns off grid on the left Axis.
    ax.grid(True)
    plt.xticks(list(range(0,100,2)))
    sns.swarmplot(y=feature1, x=feature2, hue='Survived', hue_order=[1, 0],palette={1: 'green', 0: 'red'}, data=train_df).set_title(title)
swarmplot_survivors(train_df, 'Sex','Age', "Survivor Swarmplot for Age vs Sex")

Observations:
Age Distribution:
There are more young survivors (ages 0-10) in the 'female' category compared to the 'male' category. The age distribution among males shows a higher concentration in the 20-40 age range. Females also show a significant concentration in the 20-40 age range but with more survivors than males.

Survival Rate by Gender:
There are more orange dots (survivors) among females across all age groups, indicating a higher survival rate for females. Males have more blue dots (non-survivors) compared to females, especially noticeable in the 20-40 age range.

Outliers:
There are few older individuals (70-80 years) in both categories, with very few survivors.

-->Analyze Features Age and Pclass together
swarmplot_survivors(train_df, 'Pclass', 'Age', 'Age vs Pclass' )

First class is more survived then second class with more female
Analyze Fare
train_df["Fare"].describe().to_frame()
Fare
count	891.000000
mean	32.204208
std	49.693429
min	0.000000
25%	7.910400
50%	14.454200
75%	31.000000
max	512.329200
plot_distribution(train_df, 'Fare', "Passengers fare")

Observation:
The Fare data does not follow a normal distribution and exhibits a significant peak in the price range of 
100.

The distribution is skewed to the right, with 75% of fares being under 31USD and a maximum fare of 512USD. Given this skewness, it might be beneficial to normalize this feature, depending on the machine learning model being used. This will be addressed in the feature engineering stage.

To understand how the Fare feature influences the survival rate, we could plot bar charts of Fare vs. Survived. However, due to the wide range of fare values, such a plot may not provide meaningful insights.

A more effective visualization would involve categorizing the fare values and then plotting these categories against the survival rate.

def plot_quartiles(dataset, feature, title, categories):
    fig, axarr = plt.subplots(figsize=(5,5))
    fare_ranges = pd.qcut(dataset[feature], len(categories), labels = categories) #. [0, .25, .5, .75, 1.]
    axarr.set_title(title)
    sns.barplot(x=fare_ranges, y=dataset.Survived, ci=None, ax=axarr).set_ylabel('Survival rate')
categories = ['Cheap', 'Standard', 'Expensive', 'Luxury']

plot_quartiles(train_df, "Fare", "Survival Rate by Fare Ranges/Categories", categories)

swarmplot_survivors(train_df, "Sex", "Fare","Survivor Swarmplot for Age vs Gender")

train_df.Fare.value_counts()
Fare
8.0500      43
13.0000     42
7.8958      38
7.7500      34
26.0000     31
10.5000     24
7.9250      18
7.7750      16
7.2292      15
0.0000      15
26.5500     15
7.8542      13
8.6625      13
7.2500      13
7.2250      12
9.5000       9
16.1000      9
24.1500      8
15.5000      8
14.4542      7
69.5500      7
52.0000      7
7.0500       7
56.4958      7
14.5000      7
31.2750      7
39.6875      6
7.7958       6
27.9000      6
30.0000      6
46.9000      6
26.2500      6
21.0000      6
27.7208      5
29.1250      5
15.2458      5
73.5000      5
30.5000      5
53.1000      5
39.0000      4
90.0000      4
15.8500      4
13.5000      4
7.5500       4
23.0000      4
12.4750      4
25.4667      4
7.1250       4
7.6500       4
21.0750      4
7.7333       4
11.5000      4
34.3750      4
7.8792       4
19.2583      4
227.5250     4
27.7500      4
263.0000     4
31.3875      4
79.2000      4
151.5500     4
35.5000      4
120.0000     4
110.8833     4
7.4958       3
83.1583      3
211.3375     3
33.0000      3
20.5250      3
86.5000      3
12.3500      3
512.3292     3
31.0000      3
113.2750     3
77.9583      3
29.7000      3
135.6333     3
26.2875      3
153.4625     3
79.6500      3
18.7500      3
52.5542      3
14.4583      3
76.7292      3
41.5792      3
11.1333      3
18.0000      3
15.7417      2
65.0000      2
134.5000     2
164.8667     2
262.3750     2
82.1708      2
56.9292      2
108.9000     2
24.0000      2
133.6500     2
11.2417      2
7.0542       2
23.2500      2
78.8500      2
20.2500      2
17.8000      2
19.5000      2
57.9792      2
9.2250       2
15.9000      2
106.4250     2
49.5042      2
9.5875       2
16.7000      2
30.0708      2
93.5000      2
89.1042      2
19.9667      2
55.9000      2
83.4750      2
14.4000      2
71.0000      2
7.8292       2
39.6000      2
146.5208     2
69.3000      2
51.8625      2
80.0000      2
91.0792      2
78.2667      2
27.0000      2
55.0000      2
9.8250       2
30.6958      2
247.5208     2
20.2125      2
77.2875      2
37.0042      2
25.9292      2
66.6000      2
6.4958       2
10.4625      2
23.4500      2
20.5750      2
18.7875      2
9.3500       2
22.3583      2
57.0000      2
36.7500      2
6.9750       2
29.0000      2
6.7500       2
7.7375       2
9.0000       2
5.0000       1
14.1083      1
9.8458       1
39.4000      1
13.8625      1
7.6292       1
13.8583      1
22.5250      1
49.5000      1
50.4958      1
221.7792     1
59.4000      1
34.0208      1
51.4792      1
17.4000      1
8.4583       1
26.3875      1
6.4375       1
10.1708      1
13.4167      1
8.1375       1
7.7417       1
9.4833       1
15.1000      1
9.8417       1
25.5875      1
8.4333       1
8.3625       1
32.3208      1
8.6833       1
8.5167       1
7.8875       1
15.5500      1
6.4500       1
6.9500       1
15.0000      1
8.7125       1
40.1250      1
8.3000       1
42.4000      1
26.2833      1
12.2875      1
7.5208       1
7.8000       1
61.9792      1
8.1583       1
71.2833      1
12.2750      1
7.7875       1
47.1000      1
61.1750      1
76.2917      1
34.6542      1
8.4042       1
50.0000      1
22.0250      1
63.3583      1
15.0500      1
28.7125      1
8.6542       1
33.5000      1
25.9250      1
15.7500      1
7.1417       1
61.3792      1
7.3125       1
12.5250      1
15.0458      1
12.8750      1
8.8500       1
21.6792      1
12.6500      1
7.0458       1
9.8375       1
13.7917      1
7.7250       1
38.5000      1
16.0000      1
81.8583      1
8.1125       1
7.8750       1
32.5000      1
6.8583       1
8.0292       1
9.4750       1
12.0000      1
7.7292       1
9.2167       1
4.0125       1
211.5000     1
55.4417      1
75.2500      1
35.0000      1
28.5000      1
6.2375       1
14.0000      1
10.5167      1
Name: count, dtype: int64
Observation:
Fifteen passengers paid no fare, which is unrealistic. Therefore, I will replace the 0 values with NaN and later determine an appropriate method to impute these values

# Replace Fare == 0 with nan
train_df['Fare'] = train_df['Fare'].replace(0, np.nan)
test_df['Fare'] = train_df['Fare'].replace(0, np.nan)
train_df[train_df['Fare']==0]
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
test_df[test_df['Fare']==0]
PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
Analyze Feature Embarked
def countplot(dataset, feature, title, fsize = (5,5)):
    fig, ax = plt.subplots(figsize=fsize)
    sns.countplot(dataset[feature], ax=ax).set_title(title)
    
def compare_countplot(dataset, feature1, feature2, title):
    fig, ax = plt.subplots(figsize=(5,5))
    p = sns.countplot(x = feature1, hue = feature2, data = dataset, ax=ax).set_title(title) 
bar_chart_stacked(train_df, 'Embarked')

compare_countplot(train_df, "Embarked", "Survived", "Survivor count by place of embarktion")

pd.pivot_table(train_df, index = 'Survived', columns = 'Embarked', values = 'PassengerId' ,aggfunc ='count')
Embarked	C	Q	S
Survived			
0	75	47	427
1	93	30	217
len(train_df.query('Embarked == "C" and Survived==1'))
93
train_df.groupby(['Embarked']).Survived.mean().to_frame()
Survived
Embarked	
C	0.553571
Q	0.389610
S	0.336957
Observation:
The Embarked feature includes three values: Southampton, Cherbourg, and Queenstown. Most passengers boarded in Southampton, but only 33% survived. In contrast, Cherbourg had a survival rate of 55%.

It’s not intuitive that the place of boarding would affect survival. Why is it higher for Cherbourg? One possible explanation is the percentage of first-class passengers who embarked there, as first-class status is linked to higher survival rates.

Analyze Features Embarked & Pclass at a time
compare_countplot(train_df, 'Embarked', 'Pclass', 'Embarked vs Pclass')

train_df.groupby(['Pclass', 'Embarked', "Sex"]).Survived.sum().to_frame()
Survived
Pclass	Embarked	Sex	
1	C	female	42
male	17
Q	female	1
male	0
S	female	46
male	28
2	C	female	7
male	2
Q	female	2
male	0
S	female	61
male	15
3	C	female	15
male	10
Q	female	24
male	3
S	female	33
male	34
General Observations
Survival by Gender:
Females generally had a higher survival count than males across all classes and ports of embarkation. In Pclass 1, the number of female survivors was notably higher than male survivors. In Pclass 2, the difference is even more pronounced, especially for those who embarked at S (61 females vs. 15 males). In Pclass 3, the trend of higher female survivors continues but with more variation across different embarkation ports. Survival by Embarkation Port:

For Pclass 1 and Pclass 2, the majority of survivors embarked at S (Southampton), followed by C (Cherbourg), with Q (Queenstown) having the least number of survivors. For Pclass 3, the survival count is more evenly distributed among the ports, especially among females. Survival by Passenger Class:

Pclass 1 had relatively high survival counts, particularly among females. Pclass 2 had fewer survivors compared to Pclass 1, but females still had a significant number of survivors. Pclass 3 showed a mixed trend with a considerable number of survivors but more evenly distributed compared to Pclass 1 and 2.

Analyze Features Embarked & Se
compare_countplot(train_df, "Embarked", "Sex", "Passenger count by place of embarktion and sex")

Analyze Feature SibSp
train_df['SibSp'].value_counts().to_frame()
count
SibSp	
0	608
1	209
2	28
4	18
3	16
8	7
5	5
bar_compare(train_df, "SibSp")

train_df.groupby(['SibSp']).Survived.mean().to_frame()
Survived
SibSp	
0	0.345395
1	0.535885
2	0.464286
3	0.250000
4	0.166667
5	0.000000
8	0.000000
compare_countplot(train_df, "SibSp", "Survived", "Survivor count by number of sibling the Titanic")

Analyze Feature Parch
bar_compare(train_df, "Parch")

train_df.groupby(['Parch']).Survived.mean().to_frame()
Survived
Parch	
0	0.343658
1	0.550847
2	0.500000
3	0.600000
4	0.000000
5	0.200000
6	0.000000
Observation:
This feature, like the SibSp column, represents the number of parents or children each passenger was traveling with. Similar patterns emerge: small families had higher survival rates compared to larger families and passengers traveling alone.

Feature engineering
Feature Name:
pd.unique(train_df['Name'])
array(['Braund, Mr. Owen Harris',
       'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
       'Heikkinen, Miss. Laina',
       'Futrelle, Mrs. Jacques Heath (Lily May Peel)',
       'Allen, Mr. William Henry', 'Moran, Mr. James',
       'McCarthy, Mr. Timothy J', 'Palsson, Master. Gosta Leonard',
       'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)',
       'Nasser, Mrs. Nicholas (Adele Achem)',
       'Sandstrom, Miss. Marguerite Rut', 'Bonnell, Miss. Elizabeth',
       'Saundercock, Mr. William Henry', 'Andersson, Mr. Anders Johan',
       'Vestrom, Miss. Hulda Amanda Adolfina',
       'Hewlett, Mrs. (Mary D Kingcome) ', 'Rice, Master. Eugene',
       'Williams, Mr. Charles Eugene',
       'Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)',
       'Masselmani, Mrs. Fatima', 'Fynney, Mr. Joseph J',
       'Beesley, Mr. Lawrence', 'McGowan, Miss. Anna "Annie"',
       'Sloper, Mr. William Thompson', 'Palsson, Miss. Torborg Danira',
       'Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)',
       'Emir, Mr. Farred Chehab', 'Fortune, Mr. Charles Alexander',
       'O\'Dwyer, Miss. Ellen "Nellie"', 'Todoroff, Mr. Lalio',
       'Uruchurtu, Don. Manuel E',
       'Spencer, Mrs. William Augustus (Marie Eugenie)',
       'Glynn, Miss. Mary Agatha', 'Wheadon, Mr. Edward H',
       'Meyer, Mr. Edgar Joseph', 'Holverson, Mr. Alexander Oskar',
       'Mamee, Mr. Hanna', 'Cann, Mr. Ernest Charles',
       'Vander Planke, Miss. Augusta Maria',
       'Nicola-Yarred, Miss. Jamila',
       'Ahlin, Mrs. Johan (Johanna Persdotter Larsson)',
       'Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)',
       'Kraeff, Mr. Theodor', 'Laroche, Miss. Simonne Marie Anne Andree',
       'Devaney, Miss. Margaret Delia', 'Rogers, Mr. William John',
       'Lennon, Mr. Denis', "O'Driscoll, Miss. Bridget",
       'Samaan, Mr. Youssef',
       'Arnold-Franchi, Mrs. Josef (Josefine Franchi)',
       'Panula, Master. Juha Niilo', 'Nosworthy, Mr. Richard Cater',
       'Harper, Mrs. Henry Sleeper (Myna Haxtun)',
       'Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)',
       'Ostby, Mr. Engelhart Cornelius', 'Woolner, Mr. Hugh',
       'Rugg, Miss. Emily', 'Novel, Mr. Mansouer',
       'West, Miss. Constance Mirium',
       'Goodwin, Master. William Frederick', 'Sirayanian, Mr. Orsen',
       'Icard, Miss. Amelie', 'Harris, Mr. Henry Birkhardt',
       'Skoog, Master. Harald', 'Stewart, Mr. Albert A',
       'Moubarek, Master. Gerios', 'Nye, Mrs. (Elizabeth Ramell)',
       'Crease, Mr. Ernest James', 'Andersson, Miss. Erna Alexandra',
       'Kink, Mr. Vincenz', 'Jenkin, Mr. Stephen Curnow',
       'Goodwin, Miss. Lillian Amy', 'Hood, Mr. Ambrose Jr',
       'Chronopoulos, Mr. Apostolos', 'Bing, Mr. Lee',
       'Moen, Mr. Sigurd Hansen', 'Staneff, Mr. Ivan',
       'Moutal, Mr. Rahamin Haim', 'Caldwell, Master. Alden Gates',
       'Dowdell, Miss. Elizabeth', 'Waelens, Mr. Achille',
       'Sheerlinck, Mr. Jan Baptist', 'McDermott, Miss. Brigdet Delia',
       'Carrau, Mr. Francisco M', 'Ilett, Miss. Bertha',
       'Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)',
       'Ford, Mr. William Neal', 'Slocovski, Mr. Selman Francis',
       'Fortune, Miss. Mabel Helen', 'Celotti, Mr. Francesco',
       'Christmann, Mr. Emil', 'Andreasson, Mr. Paul Edvin',
       'Chaffee, Mr. Herbert Fuller', 'Dean, Mr. Bertram Frank',
       'Coxon, Mr. Daniel', 'Shorney, Mr. Charles Joseph',
       'Goldschmidt, Mr. George B', 'Greenfield, Mr. William Bertram',
       'Doling, Mrs. John T (Ada Julia Bone)', 'Kantor, Mr. Sinai',
       'Petranec, Miss. Matilda', 'Petroff, Mr. Pastcho ("Pentcho")',
       'White, Mr. Richard Frasar', 'Johansson, Mr. Gustaf Joel',
       'Gustafsson, Mr. Anders Vilhelm', 'Mionoff, Mr. Stoytcho',
       'Salkjelsvik, Miss. Anna Kristine', 'Moss, Mr. Albert Johan',
       'Rekic, Mr. Tido', 'Moran, Miss. Bertha',
       'Porter, Mr. Walter Chamberlain', 'Zabour, Miss. Hileni',
       'Barton, Mr. David John', 'Jussila, Miss. Katriina',
       'Attalah, Miss. Malake', 'Pekoniemi, Mr. Edvard',
       'Connors, Mr. Patrick', 'Turpin, Mr. William John Robert',
       'Baxter, Mr. Quigg Edmond', 'Andersson, Miss. Ellis Anna Maria',
       'Hickman, Mr. Stanley George', 'Moore, Mr. Leonard Charles',
       'Nasser, Mr. Nicholas', 'Webber, Miss. Susan',
       'White, Mr. Percival Wayland', 'Nicola-Yarred, Master. Elias',
       'McMahon, Mr. Martin', 'Madsen, Mr. Fridtjof Arne',
       'Peter, Miss. Anna', 'Ekstrom, Mr. Johan', 'Drazenoic, Mr. Jozef',
       'Coelho, Mr. Domingos Fernandeo',
       'Robins, Mrs. Alexander A (Grace Charity Laury)',
       'Weisz, Mrs. Leopold (Mathilde Francoise Pede)',
       'Sobey, Mr. Samuel James Hayden', 'Richard, Mr. Emile',
       'Newsom, Miss. Helen Monypeny', 'Futrelle, Mr. Jacques Heath',
       'Osen, Mr. Olaf Elon', 'Giglio, Mr. Victor',
       'Boulos, Mrs. Joseph (Sultana)', 'Nysten, Miss. Anna Sofia',
       'Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)',
       'Burke, Mr. Jeremiah', 'Andrew, Mr. Edgardo Samuel',
       'Nicholls, Mr. Joseph Charles',
       'Andersson, Mr. August Edvard ("Wennerstrom")',
       'Ford, Miss. Robina Maggie "Ruby"',
       'Navratil, Mr. Michel ("Louis M Hoffman")',
       'Byles, Rev. Thomas Roussel Davids', 'Bateman, Rev. Robert James',
       'Pears, Mrs. Thomas (Edith Wearne)', 'Meo, Mr. Alfonzo',
       'van Billiard, Mr. Austin Blyler', 'Olsen, Mr. Ole Martin',
       'Williams, Mr. Charles Duane', 'Gilnagh, Miss. Katherine "Katie"',
       'Corn, Mr. Harry', 'Smiljanic, Mr. Mile',
       'Sage, Master. Thomas Henry', 'Cribb, Mr. John Hatfield',
       'Watt, Mrs. James (Elizabeth "Bessie" Inglis Milne)',
       'Bengtsson, Mr. John Viktor', 'Calic, Mr. Jovo',
       'Panula, Master. Eino Viljami',
       'Goldsmith, Master. Frank John William "Frankie"',
       'Chibnall, Mrs. (Edith Martha Bowerman)',
       'Skoog, Mrs. William (Anna Bernhardina Karlsson)',
       'Baumann, Mr. John D', 'Ling, Mr. Lee',
       'Van der hoef, Mr. Wyckoff', 'Rice, Master. Arthur',
       'Johnson, Miss. Eleanor Ileen', 'Sivola, Mr. Antti Wilhelm',
       'Smith, Mr. James Clinch', 'Klasen, Mr. Klas Albin',
       'Lefebre, Master. Henry Forbes', 'Isham, Miss. Ann Elizabeth',
       'Hale, Mr. Reginald', 'Leonard, Mr. Lionel',
       'Sage, Miss. Constance Gladys', 'Pernot, Mr. Rene',
       'Asplund, Master. Clarence Gustaf Hugo',
       'Becker, Master. Richard F', 'Kink-Heilmann, Miss. Luise Gretchen',
       'Rood, Mr. Hugh Roscoe',
       'O\'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey)',
       'Romaine, Mr. Charles Hallace ("Mr C Rolmane")',
       'Bourke, Mr. John', 'Turcin, Mr. Stjepan', 'Pinsky, Mrs. (Rosa)',
       'Carbines, Mr. William',
       'Andersen-Jensen, Miss. Carla Christine Nielsine',
       'Navratil, Master. Michel M',
       'Brown, Mrs. James Joseph (Margaret Tobin)',
       'Lurette, Miss. Elise', 'Mernagh, Mr. Robert',
       'Olsen, Mr. Karl Siegwart Andreas',
       'Madigan, Miss. Margaret "Maggie"',
       'Yrois, Miss. Henriette ("Mrs Harbeck")',
       'Vande Walle, Mr. Nestor Cyriel', 'Sage, Mr. Frederick',
       'Johanson, Mr. Jakob Alfred', 'Youseff, Mr. Gerious',
       'Cohen, Mr. Gurshon "Gus"', 'Strom, Miss. Telma Matilda',
       'Backstrom, Mr. Karl Alfred', 'Albimona, Mr. Nassef Cassem',
       'Carr, Miss. Helen "Ellen"', 'Blank, Mr. Henry', 'Ali, Mr. Ahmed',
       'Cameron, Miss. Clear Annie', 'Perkin, Mr. John Henry',
       'Givard, Mr. Hans Kristensen', 'Kiernan, Mr. Philip',
       'Newell, Miss. Madeleine', 'Honkanen, Miss. Eliina',
       'Jacobsohn, Mr. Sidney Samuel', 'Bazzani, Miss. Albina',
       'Harris, Mr. Walter', 'Sunderland, Mr. Victor Francis',
       'Bracken, Mr. James H', 'Green, Mr. George Henry',
       'Nenkoff, Mr. Christo', 'Hoyt, Mr. Frederick Maxfield',
       'Berglund, Mr. Karl Ivar Sven', 'Mellors, Mr. William John',
       'Lovell, Mr. John Hall ("Henry")', 'Fahlstrom, Mr. Arne Jonas',
       'Lefebre, Miss. Mathilde',
       'Harris, Mrs. Henry Birkhardt (Irene Wallach)',
       'Larsson, Mr. Bengt Edvin', 'Sjostedt, Mr. Ernst Adolf',
       'Asplund, Miss. Lillian Gertrud',
       'Leyson, Mr. Robert William Norman',
       'Harknett, Miss. Alice Phoebe', 'Hold, Mr. Stephen',
       'Collyer, Miss. Marjorie "Lottie"',
       'Pengelly, Mr. Frederick William', 'Hunt, Mr. George Henry',
       'Zabour, Miss. Thamine', 'Murphy, Miss. Katherine "Kate"',
       'Coleridge, Mr. Reginald Charles', 'Maenpaa, Mr. Matti Alexanteri',
       'Attalah, Mr. Sleiman', 'Minahan, Dr. William Edward',
       'Lindahl, Miss. Agda Thorilda Viktoria',
       'Hamalainen, Mrs. William (Anna)', 'Beckwith, Mr. Richard Leonard',
       'Carter, Rev. Ernest Courtenay', 'Reed, Mr. James George',
       'Strom, Mrs. Wilhelm (Elna Matilda Persson)',
       'Stead, Mr. William Thomas', 'Lobb, Mr. William Arthur',
       'Rosblom, Mrs. Viktor (Helena Wilhelmina)',
       'Touma, Mrs. Darwis (Hanne Youssef Razi)',
       'Thorne, Mrs. Gertrude Maybelle', 'Cherry, Miss. Gladys',
       'Ward, Miss. Anna', 'Parrish, Mrs. (Lutie Davis)',
       'Smith, Mr. Thomas', 'Asplund, Master. Edvin Rojj Felix',
       'Taussig, Mr. Emil', 'Harrison, Mr. William', 'Henry, Miss. Delia',
       'Reeves, Mr. David', 'Panula, Mr. Ernesti Arvid',
       'Persson, Mr. Ernst Ulrik',
       'Graham, Mrs. William Thompson (Edith Junkins)',
       'Bissette, Miss. Amelia', 'Cairns, Mr. Alexander',
       'Tornquist, Mr. William Henry',
       'Mellinger, Mrs. (Elizabeth Anne Maidment)',
       'Natsch, Mr. Charles H', 'Healy, Miss. Hanora "Nora"',
       'Andrews, Miss. Kornelia Theodosia',
       'Lindblom, Miss. Augusta Charlotta', 'Parkes, Mr. Francis "Frank"',
       'Rice, Master. Eric', 'Abbott, Mrs. Stanton (Rosa Hunt)',
       'Duane, Mr. Frank', 'Olsson, Mr. Nils Johan Goransson',
       'de Pelsmaeker, Mr. Alfons', 'Dorking, Mr. Edward Arthur',
       'Smith, Mr. Richard William', 'Stankovic, Mr. Ivan',
       'de Mulder, Mr. Theodore', 'Naidenoff, Mr. Penko',
       'Hosono, Mr. Masabumi', 'Connolly, Miss. Kate',
       'Barber, Miss. Ellen "Nellie"',
       'Bishop, Mrs. Dickinson H (Helen Walton)',
       'Levy, Mr. Rene Jacques', 'Haas, Miss. Aloisia',
       'Mineff, Mr. Ivan', 'Lewy, Mr. Ervin G', 'Hanna, Mr. Mansour',
       'Allison, Miss. Helen Loraine', 'Saalfeld, Mr. Adolphe',
       'Baxter, Mrs. James (Helene DeLaudeniere Chaput)',
       'Kelly, Miss. Anna Katherine "Annie Kate"', 'McCoy, Mr. Bernard',
       'Johnson, Mr. William Cahoone Jr', 'Keane, Miss. Nora A',
       'Williams, Mr. Howard Hugh "Harry"',
       'Allison, Master. Hudson Trevor', 'Fleming, Miss. Margaret',
       'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)',
       'Abelson, Mr. Samuel', 'Francatelli, Miss. Laura Mabel',
       'Hays, Miss. Margaret Bechstein', 'Ryerson, Miss. Emily Borie',
       'Lahtinen, Mrs. William (Anna Sylfven)', 'Hendekovic, Mr. Ignjac',
       'Hart, Mr. Benjamin', 'Nilsson, Miss. Helmina Josefina',
       'Kantor, Mrs. Sinai (Miriam Sternin)', 'Moraweck, Dr. Ernest',
       'Wick, Miss. Mary Natalie',
       'Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)',
       'Dennis, Mr. Samuel', 'Danoff, Mr. Yoto',
       'Slayter, Miss. Hilda Mary',
       'Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)',
       'Sage, Mr. George John Jr', 'Young, Miss. Marie Grice',
       'Nysveen, Mr. Johan Hansen', 'Ball, Mrs. (Ada E Hall)',
       'Goldsmith, Mrs. Frank John (Emily Alice Brown)',
       'Hippach, Miss. Jean Gertrude', 'McCoy, Miss. Agnes',
       'Partner, Mr. Austen', 'Graham, Mr. George Edward',
       'Vander Planke, Mr. Leo Edmondus',
       'Frauenthal, Mrs. Henry William (Clara Heinsheimer)',
       'Denkoff, Mr. Mitto', 'Pears, Mr. Thomas Clinton',
       'Burns, Miss. Elizabeth Margaret', 'Dahl, Mr. Karl Edwart',
       'Blackwell, Mr. Stephen Weart', 'Navratil, Master. Edmond Roger',
       'Fortune, Miss. Alice Elizabeth', 'Collander, Mr. Erik Gustaf',
       'Sedgwick, Mr. Charles Frederick Waddington',
       'Fox, Mr. Stanley Hubert', 'Brown, Miss. Amelia "Mildred"',
       'Smith, Miss. Marion Elsie',
       'Davison, Mrs. Thomas Henry (Mary E Finck)',
       'Coutts, Master. William Loch "William"', 'Dimic, Mr. Jovan',
       'Odahl, Mr. Nils Martin', 'Williams-Lambert, Mr. Fletcher Fellows',
       'Elias, Mr. Tannous', 'Arnold-Franchi, Mr. Josef',
       'Yousif, Mr. Wazli', 'Vanden Steen, Mr. Leo Peter',
       'Bowerman, Miss. Elsie Edith', 'Funk, Miss. Annie Clemmer',
       'McGovern, Miss. Mary', 'Mockler, Miss. Helen Mary "Ellie"',
       'Skoog, Mr. Wilhelm', 'del Carlo, Mr. Sebastiano',
       'Barbara, Mrs. (Catherine David)', 'Asim, Mr. Adola',
       "O'Brien, Mr. Thomas", 'Adahl, Mr. Mauritz Nils Martin',
       'Warren, Mrs. Frank Manley (Anna Sophia Atkinson)',
       'Moussa, Mrs. (Mantoura Boulos)', 'Jermyn, Miss. Annie',
       'Aubart, Mme. Leontine Pauline', 'Harder, Mr. George Achilles',
       'Wiklund, Mr. Jakob Alfred', 'Beavan, Mr. William Thomas',
       'Ringhini, Mr. Sante', 'Palsson, Miss. Stina Viola',
       'Meyer, Mrs. Edgar Joseph (Leila Saks)',
       'Landergren, Miss. Aurora Adelia', 'Widener, Mr. Harry Elkins',
       'Betros, Mr. Tannous', 'Gustafsson, Mr. Karl Gideon',
       'Bidois, Miss. Rosalie', 'Nakid, Miss. Maria ("Mary")',
       'Tikkanen, Mr. Juho',
       'Holverson, Mrs. Alexander Oskar (Mary Aline Towner)',
       'Plotcharsky, Mr. Vasil', 'Davies, Mr. Charles Henry',
       'Goodwin, Master. Sidney Leonard', 'Buss, Miss. Kate',
       'Sadlier, Mr. Matthew', 'Lehmann, Miss. Bertha',
       'Carter, Mr. William Ernest', 'Jansson, Mr. Carl Olof',
       'Gustafsson, Mr. Johan Birger', 'Newell, Miss. Marjorie',
       'Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)',
       'Johansson, Mr. Erik', 'Olsson, Miss. Elina',
       'McKane, Mr. Peter David', 'Pain, Dr. Alfred',
       'Trout, Mrs. William H (Jessie L)', 'Niskanen, Mr. Juha',
       'Adams, Mr. John', 'Jussila, Miss. Mari Aina',
       'Hakkarainen, Mr. Pekka Pietari', 'Oreskovic, Miss. Marija',
       'Gale, Mr. Shadrach', 'Widegren, Mr. Carl/Charles Peter',
       'Richards, Master. William Rowe',
       'Birkeland, Mr. Hans Martin Monsen', 'Lefebre, Miss. Ida',
       'Sdycoff, Mr. Todor', 'Hart, Mr. Henry', 'Minahan, Miss. Daisy E',
       'Cunningham, Mr. Alfred Fleming', 'Sundman, Mr. Johan Julian',
       'Meek, Mrs. Thomas (Annie Louise Rowley)',
       'Drew, Mrs. James Vivian (Lulu Thorne Christian)',
       'Silven, Miss. Lyyli Karoliina', 'Matthews, Mr. William John',
       'Van Impe, Miss. Catharina', 'Gheorgheff, Mr. Stanio',
       'Charters, Mr. David', 'Zimmerman, Mr. Leo',
       'Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)',
       'Rosblom, Mr. Viktor Richard', 'Wiseman, Mr. Phillippe',
       'Clarke, Mrs. Charles V (Ada Maria Winfield)',
       'Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")',
       'Flynn, Mr. James', 'Pickard, Mr. Berk (Berk Trembisky)',
       'Bjornstrom-Steffansson, Mr. Mauritz Hakan',
       'Thorneycroft, Mrs. Percival (Florence Kate White)',
       'Louch, Mrs. Charles Alexander (Alice Adelaide Slow)',
       'Kallio, Mr. Nikolai Erland', 'Silvey, Mr. William Baird',
       'Carter, Miss. Lucile Polk',
       'Ford, Miss. Doolina Margaret "Daisy"',
       'Richards, Mrs. Sidney (Emily Hocking)', 'Fortune, Mr. Mark',
       'Kvillner, Mr. Johan Henrik Johannesson',
       'Hart, Mrs. Benjamin (Esther Ada Bloomfield)', 'Hampe, Mr. Leon',
       'Petterson, Mr. Johan Emil', 'Reynaldo, Ms. Encarnacion',
       'Johannesen-Bratthammer, Mr. Bernt', 'Dodge, Master. Washington',
       'Mellinger, Miss. Madeleine Violet', 'Seward, Mr. Frederic Kimber',
       'Baclini, Miss. Marie Catherine', 'Peuchen, Major. Arthur Godfrey',
       'West, Mr. Edwy Arthur', 'Hagland, Mr. Ingvald Olai Olsen',
       'Foreman, Mr. Benjamin Laventall', 'Goldenberg, Mr. Samuel L',
       'Peduzzi, Mr. Joseph', 'Jalsevac, Mr. Ivan',
       'Millet, Mr. Francis Davis', 'Kenyon, Mrs. Frederick R (Marion)',
       'Toomey, Miss. Ellen', "O'Connor, Mr. Maurice",
       'Anderson, Mr. Harry', 'Morley, Mr. William', 'Gee, Mr. Arthur H',
       'Milling, Mr. Jacob Christian', 'Maisner, Mr. Simon',
       'Goncalves, Mr. Manuel Estanslas', 'Campbell, Mr. William',
       'Smart, Mr. John Montgomery', 'Scanlan, Mr. James',
       'Baclini, Miss. Helene Barbara', 'Keefe, Mr. Arthur',
       'Cacic, Mr. Luka', 'West, Mrs. Edwy Arthur (Ada Mary Worth)',
       'Jerwan, Mrs. Amin S (Marie Marthe Thuillard)',
       'Strandberg, Miss. Ida Sofia', 'Clifford, Mr. George Quincy',
       'Renouf, Mr. Peter Henry', 'Braund, Mr. Lewis Richard',
       'Karlsson, Mr. Nils August', 'Hirvonen, Miss. Hildur E',
       'Goodwin, Master. Harold Victor',
       'Frost, Mr. Anthony Wood "Archie"', 'Rouse, Mr. Richard Henry',
       'Turkula, Mrs. (Hedwig)', 'Bishop, Mr. Dickinson H',
       'Lefebre, Miss. Jeannie',
       'Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)',
       'Kent, Mr. Edward Austin', 'Somerton, Mr. Francis William',
       'Coutts, Master. Eden Leslie "Neville"',
       'Hagland, Mr. Konrad Mathias Reiersen', 'Windelov, Mr. Einar',
       'Molson, Mr. Harry Markland', 'Artagaveytia, Mr. Ramon',
       'Stanley, Mr. Edward Roland', 'Yousseff, Mr. Gerious',
       'Eustis, Miss. Elizabeth Mussey',
       'Shellard, Mr. Frederick William',
       'Allison, Mrs. Hudson J C (Bessie Waldo Daniels)',
       'Svensson, Mr. Olof', 'Calic, Mr. Petar', 'Canavan, Miss. Mary',
       "O'Sullivan, Miss. Bridget Mary", 'Laitinen, Miss. Kristina Sofia',
       'Maioni, Miss. Roberta',
       'Penasco y Castellana, Mr. Victor de Satode',
       'Quick, Mrs. Frederick Charles (Jane Richards)',
       'Bradley, Mr. George ("George Arthur Brayton")',
       'Olsen, Mr. Henry Margido', 'Lang, Mr. Fang',
       'Daly, Mr. Eugene Patrick', 'Webber, Mr. James',
       'McGough, Mr. James Robert',
       'Rothschild, Mrs. Martin (Elizabeth L. Barrett)',
       'Coleff, Mr. Satio', 'Walker, Mr. William Anderson',
       'Lemore, Mrs. (Amelia Milley)', 'Ryan, Mr. Patrick',
       'Angle, Mrs. William A (Florence "Mary" Agnes Hughes)',
       'Pavlovic, Mr. Stefo', 'Perreault, Miss. Anne', 'Vovk, Mr. Janko',
       'Lahoud, Mr. Sarkis',
       'Hippach, Mrs. Louis Albert (Ida Sophia Fischer)',
       'Kassem, Mr. Fared', 'Farrell, Mr. James', 'Ridsdale, Miss. Lucy',
       'Farthing, Mr. John', 'Salonen, Mr. Johan Werner',
       'Hocking, Mr. Richard George', 'Quick, Miss. Phyllis May',
       'Toufik, Mr. Nakli', 'Elias, Mr. Joseph Jr',
       'Peter, Mrs. Catherine (Catherine Rizk)', 'Cacic, Miss. Marija',
       'Hart, Miss. Eva Miriam', 'Butt, Major. Archibald Willingham',
       'LeRoy, Miss. Bertha', 'Risien, Mr. Samuel Beard',
       'Frolicher, Miss. Hedwig Margaritha', 'Crosby, Miss. Harriet R',
       'Andersson, Miss. Ingeborg Constanzia',
       'Andersson, Miss. Sigrid Elisabeth', 'Beane, Mr. Edward',
       'Douglas, Mr. Walter Donald', 'Nicholson, Mr. Arthur Ernest',
       'Beane, Mrs. Edward (Ethel Clarke)', 'Padro y Manent, Mr. Julian',
       'Goldsmith, Mr. Frank John', 'Davies, Master. John Morgan Jr',
       'Thayer, Mr. John Borland Jr', 'Sharp, Mr. Percival James R',
       "O'Brien, Mr. Timothy", 'Leeni, Mr. Fahim ("Philip Zenni")',
       'Ohman, Miss. Velin', 'Wright, Mr. George',
       'Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")',
       'Robbins, Mr. Victor', 'Taussig, Mrs. Emil (Tillie Mandelbaum)',
       'de Messemaeker, Mrs. Guillaume Joseph (Emma)',
       'Morrow, Mr. Thomas Rowan', 'Sivic, Mr. Husein',
       'Norman, Mr. Robert Douglas', 'Simmons, Mr. John',
       'Meanwell, Miss. (Marion Ogden)', 'Davies, Mr. Alfred J',
       'Stoytcheff, Mr. Ilia',
       'Palsson, Mrs. Nils (Alma Cornelia Berglund)',
       'Doharr, Mr. Tannous', 'Jonsson, Mr. Carl', 'Harris, Mr. George',
       'Appleton, Mrs. Edward Dale (Charlotte Lamson)',
       'Flynn, Mr. John Irwin ("Irving")', 'Kelly, Miss. Mary',
       'Rush, Mr. Alfred George John', 'Patchett, Mr. George',
       'Garside, Miss. Ethel',
       'Silvey, Mrs. William Baird (Alice Munger)',
       'Caram, Mrs. Joseph (Maria Elias)', 'Jussila, Mr. Eiriik',
       'Christy, Miss. Julie Rachel',
       'Thayer, Mrs. John Borland (Marian Longstreth Morris)',
       'Downton, Mr. William James', 'Ross, Mr. John Hugo',
       'Paulner, Mr. Uscher', 'Taussig, Miss. Ruth',
       'Jarvis, Mr. John Denzil', 'Frolicher-Stehli, Mr. Maxmillian',
       'Gilinski, Mr. Eliezer', 'Murdlin, Mr. Joseph',
       'Rintamaki, Mr. Matti',
       'Stephenson, Mrs. Walter Bertram (Martha Eustis)',
       'Elsbury, Mr. William James', 'Bourke, Miss. Mary',
       'Chapman, Mr. John Henry', 'Van Impe, Mr. Jean Baptiste',
       'Leitch, Miss. Jessie Wills', 'Johnson, Mr. Alfred',
       'Boulos, Mr. Hanna',
       'Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")',
       'Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)',
       'Slabenoff, Mr. Petco', 'Harrington, Mr. Charles H',
       'Torber, Mr. Ernst William', 'Homer, Mr. Harry ("Mr E Haven")',
       'Lindell, Mr. Edvard Bengtsson', 'Karaic, Mr. Milan',
       'Daniel, Mr. Robert Williams',
       'Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)',
       'Shutes, Miss. Elizabeth W',
       'Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)',
       'Jardin, Mr. Jose Neto', 'Murphy, Miss. Margaret Jane',
       'Horgan, Mr. John', 'Brocklebank, Mr. William Alfred',
       'Herman, Miss. Alice', 'Danbom, Mr. Ernst Gilbert',
       'Lobb, Mrs. William Arthur (Cordelia K Stanlick)',
       'Becker, Miss. Marion Louise', 'Gavey, Mr. Lawrence',
       'Yasbeck, Mr. Antoni', 'Kimball, Mr. Edwin Nelson Jr',
       'Nakid, Mr. Sahid', 'Hansen, Mr. Henry Damsgaard',
       'Bowen, Mr. David John "Dai"', 'Sutton, Mr. Frederick',
       'Kirkland, Rev. Charles Leonard', 'Longley, Miss. Gretchen Fiske',
       'Bostandyeff, Mr. Guentcho', "O'Connell, Mr. Patrick D",
       'Barkworth, Mr. Algernon Henry Wilson',
       'Lundahl, Mr. Johan Svensson', 'Stahelin-Maeglin, Dr. Max',
       'Parr, Mr. William Henry Marsh', 'Skoog, Miss. Mabel',
       'Davis, Miss. Mary', 'Leinonen, Mr. Antti Gustaf',
       'Collyer, Mr. Harvey', 'Panula, Mrs. Juha (Maria Emilia Ojala)',
       'Thorneycroft, Mr. Percival', 'Jensen, Mr. Hans Peder',
       'Sagesser, Mlle. Emma', 'Skoog, Miss. Margit Elizabeth',
       'Foo, Mr. Choong', 'Baclini, Miss. Eugenie',
       'Harper, Mr. Henry Sleeper', 'Cor, Mr. Liudevit',
       'Simonius-Blumer, Col. Oberst Alfons', 'Willey, Mr. Edward',
       'Stanley, Miss. Amy Zillah Elsie', 'Mitkoff, Mr. Mito',
       'Doling, Miss. Elsie', 'Kalvik, Mr. Johannes Halvorsen',
       'O\'Leary, Miss. Hanora "Norah"', 'Hegarty, Miss. Hanora "Nora"',
       'Hickman, Mr. Leonard Mark', 'Radeff, Mr. Alexander',
       'Bourke, Mrs. John (Catherine)', 'Eitemiller, Mr. George Floyd',
       'Newell, Mr. Arthur Webster', 'Frauenthal, Dr. Henry William',
       'Badt, Mr. Mohamed', 'Colley, Mr. Edward Pomeroy',
       'Coleff, Mr. Peju', 'Lindqvist, Mr. Eino William',
       'Hickman, Mr. Lewis', 'Butler, Mr. Reginald Fenton',
       'Rommetvedt, Mr. Knud Paust', 'Cook, Mr. Jacob',
       'Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)',
       'Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)',
       'Davidson, Mr. Thornton', 'Mitchell, Mr. Henry Michael',
       'Wilhelms, Mr. Charles', 'Watson, Mr. Ennis Hastings',
       'Edvardsson, Mr. Gustaf Hjalmar', 'Sawyer, Mr. Frederick Charles',
       'Turja, Miss. Anna Sofia',
       'Goodwin, Mrs. Frederick (Augusta Tyler)',
       'Cardeza, Mr. Thomas Drake Martinez', 'Peters, Miss. Katie',
       'Hassab, Mr. Hammad', 'Olsvigen, Mr. Thor Anderson',
       'Goodwin, Mr. Charles Edward', 'Brown, Mr. Thomas William Solomon',
       'Laroche, Mr. Joseph Philippe Lemercier',
       'Panula, Mr. Jaako Arnold', 'Dakic, Mr. Branko',
       'Fischer, Mr. Eberhard Thelander',
       'Madill, Miss. Georgette Alexandra', 'Dick, Mr. Albert Adrian',
       'Karun, Miss. Manca', 'Lam, Mr. Ali', 'Saad, Mr. Khalil',
       'Weir, Col. John', 'Chapman, Mr. Charles Henry',
       'Kelly, Mr. James', 'Mullens, Miss. Katherine "Katie"',
       'Thayer, Mr. John Borland',
       'Humblen, Mr. Adolf Mathias Nicolai Olsen',
       'Astor, Mrs. John Jacob (Madeleine Talmadge Force)',
       'Silverthorne, Mr. Spencer Victor', 'Barbara, Miss. Saiide',
       'Gallagher, Mr. Martin', 'Hansen, Mr. Henrik Juul',
       'Morley, Mr. Henry Samuel ("Mr Henry Marshall")',
       'Kelly, Mrs. Florence "Fannie"',
       'Calderhead, Mr. Edward Pennington', 'Cleaver, Miss. Alice',
       'Moubarek, Master. Halim Gonios ("William George")',
       'Mayne, Mlle. Berthe Antonine ("Mrs de Villiers")',
       'Klaber, Mr. Herman', 'Taylor, Mr. Elmer Zebley',
       'Larsson, Mr. August Viktor', 'Greenberg, Mr. Samuel',
       'Soholt, Mr. Peter Andreas Lauritz Andersen',
       'Endres, Miss. Caroline Louise',
       'Troutt, Miss. Edwina Celia "Winnie"', 'McEvoy, Mr. Michael',
       'Johnson, Mr. Malkolm Joackim',
       'Harper, Miss. Annie Jessie "Nina"', 'Jensen, Mr. Svend Lauritz',
       'Gillespie, Mr. William Henry', 'Hodges, Mr. Henry Price',
       'Chambers, Mr. Norman Campbell', 'Oreskovic, Mr. Luka',
       'Renouf, Mrs. Peter Henry (Lillian Jefferys)',
       'Mannion, Miss. Margareth', 'Bryhl, Mr. Kurt Arnold Gottfrid',
       'Ilmakangas, Miss. Pieta Sofia', 'Allen, Miss. Elisabeth Walton',
       'Hassan, Mr. Houssein G N', 'Knight, Mr. Robert J',
       'Berriman, Mr. William John', 'Troupiansky, Mr. Moses Aaron',
       'Williams, Mr. Leslie', 'Ford, Mrs. Edward (Margaret Ann Watson)',
       'Lesurer, Mr. Gustave J', 'Ivanoff, Mr. Kanio',
       'Nankoff, Mr. Minko', 'Hawksford, Mr. Walter James',
       'Cavendish, Mr. Tyrell William',
       'Ryerson, Miss. Susan Parker "Suzette"', 'McNamee, Mr. Neal',
       'Stranden, Mr. Juho', 'Crosby, Capt. Edward Gifford',
       'Abbott, Mr. Rossmore Edward', 'Sinkkonen, Miss. Anna',
       'Marvin, Mr. Daniel Warner', 'Connaghton, Mr. Michael',
       'Wells, Miss. Joan', 'Moor, Master. Meier',
       'Vande Velde, Mr. Johannes Joseph', 'Jonkoff, Mr. Lalio',
       'Herman, Mrs. Samuel (Jane Laver)', 'Hamalainen, Master. Viljo',
       'Carlsson, Mr. August Sigfrid', 'Bailey, Mr. Percy Andrew',
       'Theobald, Mr. Thomas Leonard',
       'Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)',
       'Garfirth, Mr. John', 'Nirva, Mr. Iisakki Antino Aijo',
       'Barah, Mr. Hanna Assi',
       'Carter, Mrs. William Ernest (Lucile Polk)',
       'Eklund, Mr. Hans Linus', 'Hogeboom, Mrs. John C (Anna Andrews)',
       'Brewe, Dr. Arthur Jackson', 'Mangan, Miss. Mary',
       'Moran, Mr. Daniel J', 'Gronnestad, Mr. Daniel Danielsen',
       'Lievens, Mr. Rene Aime', 'Jensen, Mr. Niels Peder',
       'Mack, Mrs. (Mary)', 'Elias, Mr. Dibo',
       'Hocking, Mrs. Elizabeth (Eliza Needs)',
       'Myhrman, Mr. Pehr Fabian Oliver Malkolm', 'Tobin, Mr. Roger',
       'Emanuel, Miss. Virginia Ethel', 'Kilgannon, Mr. Thomas J',
       'Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)',
       'Ayoub, Miss. Banoura',
       'Dick, Mrs. Albert Adrian (Vera Gillespie)',
       'Long, Mr. Milton Clyde', 'Johnston, Mr. Andrew G',
       'Ali, Mr. William', 'Harmer, Mr. Abraham (David Lishin)',
       'Sjoblom, Miss. Anna Sofia', 'Rice, Master. George Hugh',
       'Dean, Master. Bertram Vere', 'Guggenheim, Mr. Benjamin',
       'Keane, Mr. Andrew "Andy"', 'Gaskell, Mr. Alfred',
       'Sage, Miss. Stella Anna', 'Hoyt, Mr. William Fisher',
       'Dantcheff, Mr. Ristiu', 'Otter, Mr. Richard',
       'Leader, Dr. Alice (Farnham)', 'Osman, Mrs. Mara',
       'Ibrahim Shawah, Mr. Yousseff',
       'Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)',
       'Ponesell, Mr. Martin',
       'Collyer, Mrs. Harvey (Charlotte Annie Tate)',
       'Carter, Master. William Thornton II',
       'Thomas, Master. Assad Alexander', 'Hedman, Mr. Oskar Arvid',
       'Johansson, Mr. Karl Johan', 'Andrews, Mr. Thomas Jr',
       'Pettersson, Miss. Ellen Natalia', 'Meyer, Mr. August',
       'Chambers, Mrs. Norman Campbell (Bertha Griggs)',
       'Alexander, Mr. William', 'Lester, Mr. James',
       'Slemen, Mr. Richard James', 'Andersson, Miss. Ebba Iris Alfrida',
       'Tomlin, Mr. Ernest Portage', 'Fry, Mr. Richard',
       'Heininen, Miss. Wendla Maria', 'Mallet, Mr. Albert',
       'Holm, Mr. John Fredrik Alexander', 'Skoog, Master. Karl Thorsten',
       'Hays, Mrs. Charles Melville (Clara Jennings Gregg)',
       'Lulic, Mr. Nikola', 'Reuchlin, Jonkheer. John George',
       'Moor, Mrs. (Beila)', 'Panula, Master. Urho Abraham',
       'Flynn, Mr. John', 'Lam, Mr. Len', 'Mallet, Master. Andre',
       'McCormack, Mr. Thomas Joseph',
       'Stone, Mrs. George Nelson (Martha Evelyn)',
       'Yasbeck, Mrs. Antoni (Selini Alexander)',
       'Richards, Master. George Sibley', 'Saad, Mr. Amin',
       'Augustsson, Mr. Albert', 'Allum, Mr. Owen George',
       'Compton, Miss. Sara Rebecca', 'Pasic, Mr. Jakob',
       'Sirota, Mr. Maurice', 'Chip, Mr. Chang', 'Marechal, Mr. Pierre',
       'Alhomaki, Mr. Ilmari Rudolf', 'Mudd, Mr. Thomas Charles',
       'Serepeca, Miss. Augusta', 'Lemberopolous, Mr. Peter L',
       'Culumovic, Mr. Jeso', 'Abbing, Mr. Anthony',
       'Sage, Mr. Douglas Bullen', 'Markoff, Mr. Marin',
       'Harper, Rev. John',
       'Goldenberg, Mrs. Samuel L (Edwiga Grabowska)',
       'Andersson, Master. Sigvard Harald Elias', 'Svensson, Mr. Johan',
       'Boulos, Miss. Nourelain', 'Lines, Miss. Mary Conover',
       'Carter, Mrs. Ernest Courtenay (Lilian Hughes)',
       'Aks, Mrs. Sam (Leah Rosen)',
       'Wick, Mrs. George Dennick (Mary Hitchcock)',
       'Daly, Mr. Peter Denis ', 'Baclini, Mrs. Solomon (Latifa Qurban)',
       'Razi, Mr. Raihed', 'Hansen, Mr. Claus Peter',
       'Giles, Mr. Frederick Edward',
       'Swift, Mrs. Frederick Joel (Margaret Welles Barron)',
       'Sage, Miss. Dorothy Edith "Dolly"', 'Gill, Mr. John William',
       'Bystrom, Mrs. (Karolina)', 'Duran y More, Miss. Asuncion',
       'Roebling, Mr. Washington Augustus II',
       'van Melkebeke, Mr. Philemon', 'Johnson, Master. Harold Theodor',
       'Balkic, Mr. Cerin',
       'Beckwith, Mrs. Richard Leonard (Sallie Monypeny)',
       'Carlsson, Mr. Frans Olof', 'Vander Cruyssen, Mr. Victor',
       'Abelson, Mrs. Samuel (Hannah Wizosky)',
       'Najib, Miss. Adele Kiamie "Jane"',
       'Gustafsson, Mr. Alfred Ossian', 'Petroff, Mr. Nedelio',
       'Laleff, Mr. Kristo',
       'Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)',
       'Shelley, Mrs. William (Imanita Parrish Hall)',
       'Markun, Mr. Johann', 'Dahlberg, Miss. Gerda Ulrika',
       'Banfield, Mr. Frederick James', 'Sutehall, Mr. Henry Jr',
       'Rice, Mrs. William (Margaret Norton)', 'Montvila, Rev. Juozas',
       'Graham, Miss. Margaret Edith',
       'Johnston, Miss. Catherine Helen "Carrie"',
       'Behr, Mr. Karl Howell', 'Dooley, Mr. Patrick'], dtype=object)
Extract the Titles from the name.
train_df['Title']=train_df['Name'].apply(lambda x: x.split(',')[1].split('.')[0].strip())
test_df['Title']=test_df['Name'].apply(lambda x: x.split(',')[1].split('.')[0].strip())
train_df.head(4)
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Title
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S	Mr
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C	Mrs
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S	Miss
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S	Mrs
test_df.head(4)
PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Title
0	892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.2500	NaN	Q	Mr
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	71.2833	NaN	S	Mrs
2	894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	7.9250	NaN	Q	Mr
3	895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	53.1000	NaN	S	Mr
train_df['Title'].value_counts().to_frame()
count
Title	
Mr	517
Miss	182
Mrs	125
Master	40
Dr	7
Rev	6
Mlle	2
Major	2
Col	2
the Countess	1
Capt	1
Ms	1
Sir	1
Lady	1
Mme	1
Don	1
Jonkheer	1
sns.countplot(data=train_df, x= 'Title', hue='Survived')
<Axes: xlabel='Title', ylabel='count'>

train_df.groupby(['Title', 'Sex', 'Pclass']).Survived.sum().to_frame()
Survived
Title	Sex	Pclass	
Capt	male	1	0
Col	male	1	1
Don	male	1	0
Dr	female	1	1
male	1	2
2	0
Jonkheer	male	1	0
Lady	female	1	1
Major	male	1	1
Master	male	1	3
2	9
3	11
Miss	female	1	44
2	32
3	51
Mlle	female	1	2
Mme	female	1	1
Mr	male	1	37
2	8
3	36
Mrs	female	1	41
2	37
3	21
Ms	female	2	1
Rev	male	2	0
Sir	male	1	1
the Countess	female	1	1
bar_chart_stacked(train_df, 'Title')

train_df['Title'].replace(['Mme', 'Ms', 'Lady', 'Mlle', 'the Countess', 'Dona'], 'Miss', inplace=True)
test_df['Title'].replace(['Mme', 'Ms', 'Lady', 'Mlle', 'the Countess', 'Dona'], 'Miss', inplace=True)
train_df['Title'].replace(['Major', 'Col', 'Capt', 'Don', 'Sir', 'Jonkheer'], 'Mr', inplace=True)
test_df['Title'].replace(['Major', 'Col', 'Capt', 'Don', 'Sir', 'Jonkheer'], 'Mr', inplace=True)
train_df.head(4)
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Title
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S	Mr
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C	Mrs
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S	Miss
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S	Mrs
bar_chart_stacked(train_df, 'Title')

train_df.groupby(['Title', 'Sex', 'Pclass']).Survived.sum().to_frame()
Survived
Title	Sex	Pclass	
Dr	female	1	1
male	1	2
2	0
Master	male	1	3
2	9
3	11
Miss	female	1	49
2	33
3	51
Mr	male	1	40
2	8
3	36
Mrs	female	1	41
2	37
3	21
Rev	male	2	0
Observation:
As expected, female titles correspond to a higher survival rate. Surprisingly, "Master" and "Dr." titles, typically associated with males, also show a relatively high survival rate. In contrast, those with the title "Mr." face a compromised survival rate of approximately 15%. Interestingly, all six individuals with the title "Reverend" perished, possibly indicating a decision to face their fate with dignity.

Cabin and Ticket
train_df[['Cabin', 'Ticket']]
Cabin	Ticket
0	NaN	A/5 21171
1	C85	PC 17599
2	NaN	STON/O2. 3101282
3	C123	113803
4	NaN	373450
5	NaN	330877
6	E46	17463
7	NaN	349909
8	NaN	347742
9	NaN	237736
10	G6	PP 9549
11	C103	113783
12	NaN	A/5. 2151
13	NaN	347082
14	NaN	350406
15	NaN	248706
16	NaN	382652
17	NaN	244373
18	NaN	345763
19	NaN	2649
20	NaN	239865
21	D56	248698
22	NaN	330923
23	A6	113788
24	NaN	349909
25	NaN	347077
26	NaN	2631
27	C23 C25 C27	19950
28	NaN	330959
29	NaN	349216
30	NaN	PC 17601
31	B78	PC 17569
32	NaN	335677
33	NaN	C.A. 24579
34	NaN	PC 17604
35	NaN	113789
36	NaN	2677
37	NaN	A./5. 2152
38	NaN	345764
39	NaN	2651
40	NaN	7546
41	NaN	11668
42	NaN	349253
43	NaN	SC/Paris 2123
44	NaN	330958
45	NaN	S.C./A.4. 23567
46	NaN	370371
47	NaN	14311
48	NaN	2662
49	NaN	349237
50	NaN	3101295
51	NaN	A/4. 39886
52	D33	PC 17572
53	NaN	2926
54	B30	113509
55	C52	19947
56	NaN	C.A. 31026
57	NaN	2697
58	NaN	C.A. 34651
59	NaN	CA 2144
60	NaN	2669
61	B28	113572
62	C83	36973
63	NaN	347088
64	NaN	PC 17605
65	NaN	2661
66	F33	C.A. 29395
67	NaN	S.P. 3464
68	NaN	3101281
69	NaN	315151
70	NaN	C.A. 33111
71	NaN	CA 2144
72	NaN	S.O.C. 14879
73	NaN	2680
74	NaN	1601
75	F G73	348123
76	NaN	349208
77	NaN	374746
78	NaN	248738
79	NaN	364516
80	NaN	345767
81	NaN	345779
82	NaN	330932
83	NaN	113059
84	NaN	SO/C 14885
85	NaN	3101278
86	NaN	W./C. 6608
87	NaN	SOTON/OQ 392086
88	C23 C25 C27	19950
89	NaN	343275
90	NaN	343276
91	NaN	347466
92	E31	W.E.P. 5734
93	NaN	C.A. 2315
94	NaN	364500
95	NaN	374910
96	A5	PC 17754
97	D10 D12	PC 17759
98	NaN	231919
99	NaN	244367
100	NaN	349245
101	NaN	349215
102	D26	35281
103	NaN	7540
104	NaN	3101276
105	NaN	349207
106	NaN	343120
107	NaN	312991
108	NaN	349249
109	NaN	371110
110	C110	110465
111	NaN	2665
112	NaN	324669
113	NaN	4136
114	NaN	2627
115	NaN	STON/O 2. 3101294
116	NaN	370369
117	NaN	11668
118	B58 B60	PC 17558
119	NaN	347082
120	NaN	S.O.C. 14879
121	NaN	A4. 54510
122	NaN	237736
123	E101	27267
124	D26	35281
125	NaN	2651
126	NaN	370372
127	NaN	C 17369
128	F E69	2668
129	NaN	347061
130	NaN	349241
131	NaN	SOTON/O.Q. 3101307
132	NaN	A/5. 3337
133	NaN	228414
134	NaN	C.A. 29178
135	NaN	SC/PARIS 2133
136	D47	11752
137	C123	113803
138	NaN	7534
139	B86	PC 17593
140	NaN	2678
141	NaN	347081
142	NaN	STON/O2. 3101279
143	NaN	365222
144	NaN	231945
145	NaN	C.A. 33112
146	NaN	350043
147	NaN	W./C. 6608
148	F2	230080
149	NaN	244310
150	NaN	S.O.P. 1166
151	C2	113776
152	NaN	A.5. 11206
153	NaN	A/5. 851
154	NaN	Fa 265302
155	NaN	PC 17597
156	NaN	35851
157	NaN	SOTON/OQ 392090
158	NaN	315037
159	NaN	CA. 2343
160	NaN	371362
161	NaN	C.A. 33595
162	NaN	347068
163	NaN	315093
164	NaN	3101295
165	NaN	363291
166	E33	113505
167	NaN	347088
168	NaN	PC 17318
169	NaN	1601
170	B19	111240
171	NaN	382652
172	NaN	347742
173	NaN	STON/O 2. 3101280
174	A7	17764
175	NaN	350404
176	NaN	4133
177	C49	PC 17595
178	NaN	250653
179	NaN	LINE
180	NaN	CA. 2343
181	NaN	SC/PARIS 2131
182	NaN	347077
183	F4	230136
184	NaN	315153
185	A32	113767
186	NaN	370365
187	NaN	111428
188	NaN	364849
189	NaN	349247
190	NaN	234604
191	NaN	28424
192	NaN	350046
193	F2	230080
194	B4	PC 17610
195	B80	PC 17569
196	NaN	368703
197	NaN	4579
198	NaN	370370
199	NaN	248747
200	NaN	345770
201	NaN	CA. 2343
202	NaN	3101264
203	NaN	2628
204	NaN	A/5 3540
205	G6	347054
206	NaN	3101278
207	NaN	2699
208	NaN	367231
209	A31	112277
210	NaN	SOTON/O.Q. 3101311
211	NaN	F.C.C. 13528
212	NaN	A/5 21174
213	NaN	250646
214	NaN	367229
215	D36	35273
216	NaN	STON/O2. 3101283
217	NaN	243847
218	D15	11813
219	NaN	W/C 14208
220	NaN	SOTON/OQ 392089
221	NaN	220367
222	NaN	21440
223	NaN	349234
224	C93	19943
225	NaN	PP 4348
226	NaN	SW/PP 751
227	NaN	A/5 21173
228	NaN	236171
229	NaN	4133
230	C83	36973
231	NaN	347067
232	NaN	237442
233	NaN	347077
234	NaN	C.A. 29566
235	NaN	W./C. 6609
236	NaN	26707
237	NaN	C.A. 31921
238	NaN	28665
239	NaN	SCO/W 1585
240	NaN	2665
241	NaN	367230
242	NaN	W./C. 14263
243	NaN	STON/O 2. 3101275
244	NaN	2694
245	C78	19928
246	NaN	347071
247	NaN	250649
248	D35	11751
249	NaN	244252
250	NaN	362316
251	G6	347054
252	C87	113514
253	NaN	A/5. 3336
254	NaN	370129
255	NaN	2650
256	NaN	PC 17585
257	B77	110152
258	NaN	PC 17755
259	NaN	230433
260	NaN	384461
261	NaN	347077
262	E67	110413
263	B94	112059
264	NaN	382649
265	NaN	C.A. 17248
266	NaN	3101295
267	NaN	347083
268	C125	PC 17582
269	C99	PC 17760
270	NaN	113798
271	NaN	LINE
272	NaN	250644
273	C118	PC 17596
274	NaN	370375
275	D7	13502
276	NaN	347073
277	NaN	239853
278	NaN	382652
279	NaN	C.A. 2673
280	NaN	336439
281	NaN	347464
282	NaN	345778
283	NaN	A/5. 10482
284	A19	113056
285	NaN	349239
286	NaN	345774
287	NaN	349206
288	NaN	237798
289	NaN	370373
290	NaN	19877
291	B49	11967
292	D	SC/Paris 2163
293	NaN	349236
294	NaN	349233
295	NaN	PC 17612
296	NaN	2693
297	C22 C26	113781
298	C106	19988
299	B58 B60	PC 17558
300	NaN	9234
301	NaN	367226
302	NaN	LINE
303	E101	226593
304	NaN	A/5 2466
305	C22 C26	113781
306	NaN	17421
307	C65	PC 17758
308	NaN	P/PP 3381
309	E36	PC 17485
310	C54	11767
311	B57 B59 B63 B66	PC 17608
312	NaN	250651
313	NaN	349243
314	NaN	F.C.C. 13529
315	NaN	347470
316	NaN	244367
317	NaN	29011
318	C7	36928
319	E34	16966
320	NaN	A/5 21172
321	NaN	349219
322	NaN	234818
323	NaN	248738
324	NaN	CA. 2343
325	C32	PC 17760
326	NaN	345364
327	D	28551
328	NaN	363291
329	B18	111361
330	NaN	367226
331	C124	113043
332	C91	PC 17582
333	NaN	345764
334	NaN	PC 17611
335	NaN	349225
336	C2	113776
337	E40	16966
338	NaN	7598
339	T	113784
340	F2	230080
341	C23 C25 C27	19950
342	NaN	248740
343	NaN	244361
344	NaN	229236
345	F33	248733
346	NaN	31418
347	NaN	386525
348	NaN	C.A. 37671
349	NaN	315088
350	NaN	7267
351	C128	113510
352	NaN	2695
353	NaN	349237
354	NaN	2647
355	NaN	345783
356	E33	113505
357	NaN	237671
358	NaN	330931
359	NaN	330980
360	NaN	347088
361	NaN	SC/PARIS 2167
362	NaN	2691
363	NaN	SOTON/O.Q. 3101310
364	NaN	370365
365	NaN	C 7076
366	D37	110813
367	NaN	2626
368	NaN	14313
369	B35	PC 17477
370	E50	11765
371	NaN	3101267
372	NaN	323951
373	NaN	PC 17760
374	NaN	349909
375	NaN	PC 17604
376	NaN	C 7077
377	C82	113503
378	NaN	2648
379	NaN	347069
380	NaN	PC 17757
381	NaN	2653
382	NaN	STON/O 2. 3101293
383	NaN	113789
384	NaN	349227
385	NaN	S.O.C. 14879
386	NaN	CA 2144
387	NaN	27849
388	NaN	367655
389	NaN	SC 1748
390	B96 B98	113760
391	NaN	350034
392	NaN	3101277
393	D36	35273
394	G6	PP 9549
395	NaN	350052
396	NaN	350407
397	NaN	28403
398	NaN	244278
399	NaN	240929
400	NaN	STON/O 2. 3101289
401	NaN	341826
402	NaN	4137
403	NaN	STON/O2. 3101279
404	NaN	315096
405	NaN	28664
406	NaN	347064
407	NaN	29106
408	NaN	312992
409	NaN	4133
410	NaN	349222
411	NaN	394140
412	C78	19928
413	NaN	239853
414	NaN	STON/O 2. 3101269
415	NaN	343095
416	NaN	28220
417	NaN	250652
418	NaN	28228
419	NaN	345773
420	NaN	349254
421	NaN	A/5. 13032
422	NaN	315082
423	NaN	347080
424	NaN	370129
425	NaN	A/4. 34244
426	NaN	2003
427	NaN	250655
428	NaN	364851
429	E10	SOTON/O.Q. 392078
430	C52	110564
431	NaN	376564
432	NaN	SC/AH 3085
433	NaN	STON/O 2. 3101274
434	E44	13507
435	B96 B98	113760
436	NaN	W./C. 6608
437	NaN	29106
438	C23 C25 C27	19950
439	NaN	C.A. 18723
440	NaN	F.C.C. 13529
441	NaN	345769
442	NaN	347076
443	NaN	230434
444	NaN	65306
445	A34	33638
446	NaN	250644
447	NaN	113794
448	NaN	2666
449	C104	113786
450	NaN	C.A. 34651
451	NaN	65303
452	C111	113051
453	C92	17453
454	NaN	A/5 2817
455	NaN	349240
456	E38	13509
457	D21	17464
458	NaN	F.C.C. 13531
459	NaN	371060
460	E12	19952
461	NaN	364506
462	E63	111320
463	NaN	234360
464	NaN	A/S 2816
465	NaN	SOTON/O.Q. 3101306
466	NaN	239853
467	NaN	113792
468	NaN	36209
469	NaN	2666
470	NaN	323592
471	NaN	315089
472	NaN	C.A. 34651
473	D	SC/AH Basle 541
474	NaN	7553
475	A14	110465
476	NaN	31027
477	NaN	3460
478	NaN	350060
479	NaN	3101298
480	NaN	CA 2144
481	NaN	239854
482	NaN	A/5 3594
483	NaN	4134
484	B49	11967
485	NaN	4133
486	C93	19943
487	B37	11771
488	NaN	A.5. 18509
489	NaN	C.A. 37671
490	NaN	65304
491	NaN	SOTON/OQ 3101317
492	C30	113787
493	NaN	PC 17609
494	NaN	A/4 45380
495	NaN	2627
496	D20	36947
497	NaN	C.A. 6212
498	C22 C26	113781
499	NaN	350035
500	NaN	315086
501	NaN	364846
502	NaN	330909
503	NaN	4135
504	B79	110152
505	C65	PC 17758
506	NaN	26360
507	NaN	111427
508	NaN	C 4001
509	NaN	1601
510	NaN	382651
511	NaN	SOTON/OQ 3101316
512	E25	PC 17473
513	NaN	PC 17603
514	NaN	349209
515	D46	36967
516	F33	C.A. 34260
517	NaN	371110
518	NaN	226875
519	NaN	349242
520	B73	12749
521	NaN	349252
522	NaN	2624
523	B18	111361
524	NaN	2700
525	NaN	367232
526	NaN	W./C. 14258
527	C95	PC 17483
528	NaN	3101296
529	NaN	29104
530	NaN	26360
531	NaN	2641
532	NaN	2690
533	NaN	2668
534	NaN	315084
535	NaN	F.C.C. 13529
536	B38	113050
537	NaN	PC 17761
538	NaN	364498
539	B39	13568
540	B22	WE/P 5735
541	NaN	347082
542	NaN	347082
543	NaN	2908
544	C86	PC 17761
545	NaN	693
546	NaN	2908
547	NaN	SC/PARIS 2146
548	NaN	363291
549	NaN	C.A. 33112
550	C70	17421
551	NaN	244358
552	NaN	330979
553	NaN	2620
554	NaN	347085
555	NaN	113807
556	A16	11755
557	NaN	PC 17757
558	E67	110413
559	NaN	345572
560	NaN	372622
561	NaN	349251
562	NaN	218629
563	NaN	SOTON/OQ 392082
564	NaN	SOTON/O.Q. 392087
565	NaN	A/4 48871
566	NaN	349205
567	NaN	349909
568	NaN	2686
569	NaN	350417
570	NaN	S.W./PP 752
571	C101	11769
572	E25	PC 17474
573	NaN	14312
574	NaN	A/4. 20589
575	NaN	358585
576	NaN	243880
577	E44	13507
578	NaN	2689
579	NaN	STON/O 2. 3101286
580	NaN	237789
581	C68	17421
582	NaN	28403
583	A10	13049
584	NaN	3411
585	E68	110413
586	NaN	237565
587	B41	13567
588	NaN	14973
589	NaN	A./5. 3235
590	NaN	STON/O 2. 3101273
591	D20	36947
592	NaN	A/5 3902
593	NaN	364848
594	NaN	SC/AH 29037
595	NaN	345773
596	NaN	248727
597	NaN	LINE
598	NaN	2664
599	A20	PC 17485
600	NaN	243847
601	NaN	349214
602	NaN	113796
603	NaN	364511
604	NaN	111426
605	NaN	349910
606	NaN	349246
607	NaN	113804
608	NaN	SC/Paris 2123
609	C125	PC 17582
610	NaN	347082
611	NaN	SOTON/O.Q. 3101305
612	NaN	367230
613	NaN	370377
614	NaN	364512
615	NaN	220845
616	NaN	347080
617	NaN	A/5. 3336
618	F4	230136
619	NaN	31028
620	NaN	2659
621	D19	11753
622	NaN	2653
623	NaN	350029
624	NaN	54636
625	D50	36963
626	NaN	219533
627	D9	13502
628	NaN	349224
629	NaN	334912
630	A23	27042
631	NaN	347743
632	B50	13214
633	NaN	112052
634	NaN	347088
635	NaN	237668
636	NaN	STON/O 2. 3101292
637	NaN	C.A. 31921
638	NaN	3101295
639	NaN	376564
640	NaN	350050
641	B35	PC 17477
642	NaN	347088
643	NaN	1601
644	NaN	2666
645	D33	PC 17572
646	NaN	349231
647	A26	13213
648	NaN	S.O./P.P. 751
649	NaN	CA. 2314
650	NaN	349221
651	NaN	231919
652	NaN	8475
653	NaN	330919
654	NaN	365226
655	NaN	S.O.C. 14879
656	NaN	349223
657	NaN	364849
658	NaN	29751
659	D48	35273
660	NaN	PC 17611
661	NaN	2623
662	E58	5727
663	NaN	349210
664	NaN	STON/O 2. 3101285
665	NaN	S.O.C. 14879
666	NaN	234686
667	NaN	312993
668	NaN	A/5 3536
669	C126	19996
670	NaN	29750
671	B71	F.C. 12750
672	NaN	C.A. 24580
673	NaN	244270
674	NaN	239856
675	NaN	349912
676	NaN	342826
677	NaN	4138
678	NaN	CA 2144
679	B51 B53 B55	PC 17755
680	NaN	330935
681	D49	PC 17572
682	NaN	6563
683	NaN	CA 2144
684	NaN	29750
685	NaN	SC/Paris 2123
686	NaN	3101295
687	NaN	349228
688	NaN	350036
689	B5	24160
690	B20	17474
691	NaN	349256
692	NaN	1601
693	NaN	2672
694	NaN	113800
695	NaN	248731
696	NaN	363592
697	NaN	35852
698	C68	17421
699	F G63	348121
700	C62 C64	PC 17757
701	E24	PC 17475
702	NaN	2691
703	NaN	36864
704	NaN	350025
705	NaN	250655
706	NaN	223596
707	E24	PC 17476
708	NaN	113781
709	NaN	2661
710	C90	PC 17482
711	C124	113028
712	C126	19996
713	NaN	7545
714	NaN	250647
715	F G73	348124
716	C45	PC 17757
717	E101	34218
718	NaN	36568
719	NaN	347062
720	NaN	248727
721	NaN	350048
722	NaN	12233
723	NaN	250643
724	E8	113806
725	NaN	315094
726	NaN	31027
727	NaN	36866
728	NaN	236853
729	NaN	STON/O2. 3101271
730	B5	24160
731	NaN	2699
732	NaN	239855
733	NaN	28425
734	NaN	233639
735	NaN	54636
736	NaN	W./C. 6608
737	B101	PC 17755
738	NaN	349201
739	NaN	349218
740	D45	16988
741	C46	19877
742	B57 B59 B63 B66	PC 17608
743	NaN	376566
744	NaN	STON/O 2. 3101288
745	B22	WE/P 5735
746	NaN	C.A. 2673
747	NaN	250648
748	D30	113773
749	NaN	335097
750	NaN	29103
751	E121	392096
752	NaN	345780
753	NaN	349204
754	NaN	220845
755	NaN	250649
756	NaN	350042
757	NaN	29108
758	NaN	363294
759	B77	110152
760	NaN	358585
761	NaN	SOTON/O2 3101272
762	NaN	2663
763	B96 B98	113760
764	NaN	347074
765	D11	13502
766	NaN	112379
767	NaN	364850
768	NaN	371110
769	NaN	8471
770	NaN	345781
771	NaN	350047
772	E77	S.O./P.P. 3
773	NaN	2674
774	NaN	29105
775	NaN	347078
776	F38	383121
777	NaN	364516
778	NaN	36865
779	B3	24160
780	NaN	2687
781	B20	17474
782	D6	113501
783	NaN	W./C. 6607
784	NaN	SOTON/O.Q. 3101312
785	NaN	374887
786	NaN	3101265
787	NaN	382652
788	NaN	C.A. 2315
789	B82 B84	PC 17593
790	NaN	12460
791	NaN	239865
792	NaN	CA. 2343
793	NaN	PC 17600
794	NaN	349203
795	NaN	28213
796	D17	17465
797	NaN	349244
798	NaN	2685
799	NaN	345773
800	NaN	250647
801	NaN	C.A. 31921
802	B96 B98	113760
803	NaN	2625
804	NaN	347089
805	NaN	347063
806	A36	112050
807	NaN	347087
808	NaN	248723
809	E8	113806
810	NaN	3474
811	NaN	A/4 48871
812	NaN	28206
813	NaN	347082
814	NaN	364499
815	B102	112058
816	NaN	STON/O2. 3101290
817	NaN	S.C./PARIS 2079
818	NaN	C 7075
819	NaN	347088
820	B69	12749
821	NaN	315098
822	NaN	19972
823	E121	392096
824	NaN	3101295
825	NaN	368323
826	NaN	1601
827	NaN	S.C./PARIS 2079
828	NaN	367228
829	B28	113572
830	NaN	2659
831	NaN	29106
832	NaN	2671
833	NaN	347468
834	NaN	2223
835	E49	PC 17756
836	NaN	315097
837	NaN	392092
838	NaN	1601
839	C47	11774
840	NaN	SOTON/O2 3101287
841	NaN	S.O./P.P. 3
842	NaN	113798
843	NaN	2683
844	NaN	315090
845	NaN	C.A. 5547
846	NaN	CA. 2343
847	NaN	349213
848	NaN	248727
849	C92	17453
850	NaN	347082
851	NaN	347060
852	NaN	2678
853	D28	PC 17592
854	NaN	244252
855	NaN	392091
856	NaN	36928
857	E17	113055
858	NaN	2666
859	NaN	2629
860	NaN	350026
861	NaN	28134
862	D17	17466
863	NaN	CA. 2343
864	NaN	233866
865	NaN	236852
866	NaN	SC/PARIS 2149
867	A24	PC 17590
868	NaN	345777
869	NaN	347742
870	NaN	349248
871	D35	11751
872	B51 B53 B55	695
873	NaN	345765
874	NaN	P/PP 3381
875	NaN	2667
876	NaN	7534
877	NaN	349212
878	NaN	349217
879	C50	11767
880	NaN	230433
881	NaN	349257
882	NaN	7552
883	NaN	C.A./SOTON 34068
884	NaN	SOTON/OQ 392076
885	NaN	382652
886	NaN	211536
887	B42	112053
888	NaN	W./C. 6607
889	C148	111369
890	NaN	370376
print(f" Null number: {train_df['Cabin'].isnull().sum()}")
print(f" Total number: {train_df['Cabin'].shape[0]}")
 Null number: 687
 Total number: 891
print(f" Null number: {train_df['Ticket'].isnull().sum()}")
print(f" Total number: {train_df['Ticket'].shape[0]}")
 Null number: 0
 Total number: 891
train_df.head()
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Title
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S	Mr
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C	Mrs
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S	Miss
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S	Mrs
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	S	Mr
train_df.query('Cabin.notnull() and Survived==1').count()
PassengerId    136
Survived       136
Pclass         136
Name           136
Sex            136
Age            125
SibSp          136
Parch          136
Ticket         136
Fare           136
Cabin          136
Embarked       134
Title          136
dtype: int64
train_df.query('Cabin.notnull() and Survived==0').count()
PassengerId    68
Survived       68
Pclass         68
Name           68
Sex            68
Age            60
SibSp          68
Parch          68
Ticket         68
Fare           65
Cabin          68
Embarked       68
Title          68
dtype: int64
train_df.query('Cabin.isnull() and Survived==1').count()
PassengerId    206
Survived       206
Pclass         206
Name           206
Sex            206
Age            165
SibSp          206
Parch          206
Ticket         206
Fare           205
Cabin            0
Embarked       206
Title          206
dtype: int64
train_df.query('Cabin.isnull() and Survived==0').count()
PassengerId    481
Survived       481
Pclass         481
Name           481
Sex            481
Age            364
SibSp          481
Parch          481
Ticket         481
Fare           470
Cabin            0
Embarked       481
Title          481
dtype: int64
train_df['cabin_replace_num'] = train_df['Cabin'].apply(lambda x: 0 if pd.isnull(x) else 1)
test_df['cabin_replace_num'] = test_df['Cabin'].apply(lambda x: 0 if pd.isnull(x) else 1)
I think the ticket feature doesn't matter much
train_df.drop('Cabin', axis=1, inplace=True)
test_df.drop('Cabin', axis=1, inplace=True)
Feature Family Size
train_df['Fam_size'] = train_df['SibSp'] + train_df['Parch'] + 1
test_df['Fam_size'] = test_df['SibSp'] + test_df['Parch'] + 1
bar_compare(train_df, "Fam_size")

Make Family Type
# Make of four groups
train_df['Fam_type'] = pd.cut(train_df.Fam_size, [0,1,4,7,11], labels=['Solo', 'Small', 'Big', 'Very big'])
test_df['Fam_type'] = pd.cut(test_df.Fam_size, [0,1,4,7,11], labels=['Solo', 'Small', 'Big', 'Very big'])
train_df.head()
PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_size	Fam_type
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	S	Mr	0	2	Small
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C	Mrs	1	2	Small
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	S	Miss	0	1	Solo
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	S	Mrs	1	2	Small
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	S	Mr	0	1	Solo
train_df.drop(['Name', 'SibSp', 'Parch','Fam_size'], axis=1, inplace=True)
test_df.drop(['Name', 'SibSp', 'Parch','Fam_size'], axis=1, inplace=True)
train_df.head()
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
0	1	0	3	male	22.0	A/5 21171	7.2500	S	Mr	0	Small
1	2	1	1	female	38.0	PC 17599	71.2833	C	Mrs	1	Small
2	3	1	3	female	26.0	STON/O2. 3101282	7.9250	S	Miss	0	Solo
3	4	1	1	female	35.0	113803	53.1000	S	Mrs	1	Small
4	5	0	3	male	35.0	373450	8.0500	S	Mr	0	Solo
print(train_df.isnull().sum())
print(test_df.isnull().sum())
PassengerId            0
Survived               0
Pclass                 0
Sex                    0
Age                  177
Ticket                 0
Fare                  15
Embarked               2
Title                  0
cabin_replace_num      0
Fam_type               0
dtype: int64
PassengerId           0
Pclass                0
Sex                   0
Age                  86
Ticket                0
Fare                  6
Embarked              0
Title                 0
cabin_replace_num     0
Fam_type              0
dtype: int64
total_df=pd.concat([train_df, test_df])
total_df.head()
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
0	1	0.0	3	male	22.0	A/5 21171	7.2500	S	Mr	0	Small
1	2	1.0	1	female	38.0	PC 17599	71.2833	C	Mrs	1	Small
2	3	1.0	3	female	26.0	STON/O2. 3101282	7.9250	S	Miss	0	Solo
3	4	1.0	1	female	35.0	113803	53.1000	S	Mrs	1	Small
4	5	0.0	3	male	35.0	373450	8.0500	S	Mr	0	Solo
total_df['Survived'].isnull().sum()
418
total_df['Survived'].notnull().sum()
891
# Calculate the mean fare for each Pclass
mean_fare_per_class = total_df.groupby('Pclass')['Fare'].transform('mean')
total_df['Fare']=total_df['Fare'].fillna(mean_fare_per_class)
total_df.head()
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
0	1	0.0	3	male	22.0	A/5 21171	7.2500	S	Mr	0	Small
1	2	1.0	1	female	38.0	PC 17599	71.2833	C	Mrs	1	Small
2	3	1.0	3	female	26.0	STON/O2. 3101282	7.9250	S	Miss	0	Solo
3	4	1.0	1	female	35.0	113803	53.1000	S	Mrs	1	Small
4	5	0.0	3	male	35.0	373450	8.0500	S	Mr	0	Solo
total_df.query('Pclass==1 and Fare.isnull()')
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
total_df.query('Pclass==2 and Fare.isnull()')
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
total_df.query('Pclass==3 and Fare.isnull()')
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
print(total_df.isnull().sum())
PassengerId            0
Survived             418
Pclass                 0
Sex                    0
Age                  263
Ticket                 0
Fare                   0
Embarked               2
Title                  0
cabin_replace_num      0
Fam_type               0
dtype: int64
#total_df['Embarked'] = total_df['Embarked'].fillna(total_df['Embarked'].mode())
total_df['Embarked'] = total_df['Embarked'].fillna(total_df['Embarked'].mode()[0])
total_df.query('Age.notnull() and Survived==1')
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
1	2	1.0	1	female	38.00	PC 17599	71.283300	C	Mrs	1	Small
2	3	1.0	3	female	26.00	STON/O2. 3101282	7.925000	S	Miss	0	Solo
3	4	1.0	1	female	35.00	113803	53.100000	S	Mrs	1	Small
8	9	1.0	3	female	27.00	347742	11.133300	S	Mrs	0	Small
9	10	1.0	2	female	14.00	237736	30.070800	C	Mrs	0	Small
10	11	1.0	3	female	4.00	PP 9549	16.700000	S	Miss	1	Small
11	12	1.0	1	female	58.00	113783	26.550000	S	Miss	1	Solo
15	16	1.0	2	female	55.00	248706	16.000000	S	Mrs	0	Solo
21	22	1.0	2	male	34.00	248698	13.000000	S	Mr	1	Solo
22	23	1.0	3	female	15.00	330923	8.029200	Q	Miss	0	Solo
23	24	1.0	1	male	28.00	113788	35.500000	S	Mr	1	Solo
25	26	1.0	3	female	38.00	347077	31.387500	S	Mrs	0	Big
39	40	1.0	3	female	14.00	2651	11.241700	C	Miss	0	Small
43	44	1.0	2	female	3.00	SC/Paris 2123	41.579200	C	Miss	0	Small
44	45	1.0	3	female	19.00	330958	7.879200	Q	Miss	0	Solo
52	53	1.0	1	female	49.00	PC 17572	76.729200	C	Mrs	1	Small
53	54	1.0	2	female	29.00	2926	26.000000	S	Mrs	0	Small
56	57	1.0	2	female	21.00	C.A. 31026	10.500000	S	Miss	0	Solo
58	59	1.0	2	female	5.00	C.A. 34651	27.750000	S	Miss	0	Small
61	62	1.0	1	female	38.00	113572	80.000000	S	Miss	1	Solo
66	67	1.0	2	female	29.00	C.A. 29395	10.500000	S	Mrs	1	Solo
68	69	1.0	3	female	17.00	3101281	7.925000	S	Miss	0	Big
74	75	1.0	3	male	32.00	1601	56.495800	S	Mr	0	Solo
78	79	1.0	2	male	0.83	248738	29.000000	S	Master	0	Small
79	80	1.0	3	female	30.00	364516	12.475000	S	Miss	0	Solo
81	82	1.0	3	male	29.00	345779	9.500000	S	Mr	0	Solo
84	85	1.0	2	female	17.00	SO/C 14885	10.500000	S	Miss	0	Solo
85	86	1.0	3	female	33.00	3101278	15.850000	S	Mrs	0	Small
88	89	1.0	1	female	23.00	19950	263.000000	S	Miss	1	Big
97	98	1.0	1	male	23.00	PC 17759	63.358300	C	Mr	1	Small
98	99	1.0	2	female	34.00	231919	23.000000	S	Mrs	0	Small
106	107	1.0	3	female	21.00	343120	7.650000	S	Miss	0	Solo
123	124	1.0	2	female	32.50	27267	13.000000	S	Miss	1	Solo
125	126	1.0	3	male	12.00	2651	11.241700	C	Master	0	Small
127	128	1.0	3	male	24.00	C 17369	7.141700	S	Mr	0	Solo
133	134	1.0	2	female	29.00	228414	26.000000	S	Mrs	0	Small
136	137	1.0	1	female	19.00	11752	26.283300	S	Miss	1	Small
141	142	1.0	3	female	22.00	347081	7.750000	S	Miss	0	Solo
142	143	1.0	3	female	24.00	STON/O2. 3101279	15.850000	S	Mrs	0	Small
146	147	1.0	3	male	27.00	350043	7.795800	S	Mr	0	Solo
151	152	1.0	1	female	22.00	113776	66.600000	S	Mrs	1	Small
156	157	1.0	3	female	16.00	35851	7.733300	Q	Miss	0	Solo
161	162	1.0	2	female	40.00	C.A. 33595	15.750000	S	Mrs	0	Solo
165	166	1.0	3	male	9.00	363291	20.525000	S	Master	0	Small
172	173	1.0	3	female	1.00	347742	11.133300	S	Miss	0	Small
183	184	1.0	2	male	1.00	230136	39.000000	S	Master	1	Small
184	185	1.0	3	female	4.00	315153	22.025000	S	Miss	0	Small
187	188	1.0	1	male	45.00	111428	26.550000	S	Mr	0	Solo
190	191	1.0	2	female	32.00	234604	13.000000	S	Mrs	0	Solo
192	193	1.0	3	female	19.00	350046	7.854200	S	Miss	0	Small
193	194	1.0	2	male	3.00	230080	26.000000	S	Master	1	Small
194	195	1.0	1	female	44.00	PC 17610	27.720800	C	Mrs	1	Solo
195	196	1.0	1	female	58.00	PC 17569	146.520800	C	Miss	1	Solo
204	205	1.0	3	male	18.00	A/5 3540	8.050000	S	Mr	0	Solo
207	208	1.0	3	male	26.00	2699	18.787500	C	Mr	0	Solo
208	209	1.0	3	female	16.00	367231	7.750000	Q	Miss	0	Solo
209	210	1.0	1	male	40.00	112277	31.000000	C	Mr	1	Solo
211	212	1.0	2	female	35.00	F.C.C. 13528	21.000000	S	Miss	0	Solo
215	216	1.0	1	female	31.00	35273	113.275000	C	Miss	1	Small
216	217	1.0	3	female	27.00	STON/O2. 3101283	7.925000	S	Miss	0	Solo
218	219	1.0	1	female	32.00	11813	76.291700	C	Miss	1	Solo
220	221	1.0	3	male	16.00	SOTON/OQ 392089	8.050000	S	Mr	0	Solo
224	225	1.0	1	male	38.00	19943	90.000000	S	Mr	1	Small
226	227	1.0	2	male	19.00	SW/PP 751	10.500000	S	Mr	0	Solo
230	231	1.0	1	female	35.00	36973	83.475000	S	Mrs	1	Small
233	234	1.0	3	female	5.00	347077	31.387500	S	Miss	0	Big
237	238	1.0	2	female	8.00	C.A. 31921	26.250000	S	Miss	0	Small
247	248	1.0	2	female	24.00	250649	14.500000	S	Mrs	0	Small
248	249	1.0	1	male	37.00	11751	52.554200	S	Mr	1	Small
255	256	1.0	3	female	29.00	2650	15.245800	C	Mrs	0	Small
257	258	1.0	1	female	30.00	110152	86.500000	S	Miss	1	Solo
258	259	1.0	1	female	35.00	PC 17755	512.329200	C	Miss	0	Solo
259	260	1.0	2	female	50.00	230433	26.000000	S	Mrs	0	Small
261	262	1.0	3	male	3.00	347077	31.387500	S	Master	0	Big
267	268	1.0	3	male	25.00	347083	7.775000	S	Mr	0	Small
268	269	1.0	1	female	58.00	PC 17582	153.462500	S	Mrs	1	Small
269	270	1.0	1	female	35.00	PC 17760	135.633300	S	Miss	1	Solo
271	272	1.0	3	male	25.00	LINE	19.719431	S	Mr	0	Solo
272	273	1.0	2	female	41.00	250644	19.500000	S	Mrs	0	Small
275	276	1.0	1	female	63.00	13502	77.958300	S	Miss	1	Small
279	280	1.0	3	female	35.00	C.A. 2673	20.250000	S	Mrs	0	Small
283	284	1.0	3	male	19.00	A/5. 10482	8.050000	S	Mr	0	Solo
286	287	1.0	3	male	30.00	345774	9.500000	S	Mr	0	Solo
288	289	1.0	2	male	42.00	237798	13.000000	S	Mr	0	Solo
289	290	1.0	3	female	22.00	370373	7.750000	Q	Miss	0	Solo
290	291	1.0	1	female	26.00	19877	78.850000	S	Miss	0	Solo
291	292	1.0	1	female	19.00	11967	91.079200	C	Mrs	1	Small
299	300	1.0	1	female	50.00	PC 17558	247.520800	C	Mrs	1	Small
305	306	1.0	1	male	0.92	113781	151.550000	S	Master	1	Small
307	308	1.0	1	female	17.00	PC 17758	108.900000	C	Mrs	1	Small
309	310	1.0	1	female	30.00	PC 17485	56.929200	C	Miss	1	Solo
310	311	1.0	1	female	24.00	11767	83.158300	C	Miss	1	Solo
311	312	1.0	1	female	18.00	PC 17608	262.375000	C	Miss	1	Big
315	316	1.0	3	female	26.00	347470	7.854200	S	Miss	0	Solo
316	317	1.0	2	female	24.00	244367	26.000000	S	Mrs	0	Small
318	319	1.0	1	female	31.00	36928	164.866700	S	Miss	1	Small
319	320	1.0	1	female	40.00	16966	134.500000	C	Mrs	1	Small
322	323	1.0	2	female	30.00	234818	12.350000	Q	Miss	0	Solo
323	324	1.0	2	female	22.00	248738	29.000000	S	Mrs	0	Small
325	326	1.0	1	female	36.00	PC 17760	135.633300	C	Miss	1	Solo
327	328	1.0	2	female	36.00	28551	13.000000	S	Mrs	1	Solo
328	329	1.0	3	female	31.00	363291	20.525000	S	Mrs	0	Small
329	330	1.0	1	female	16.00	111361	57.979200	C	Miss	1	Small
337	338	1.0	1	female	41.00	16966	134.500000	C	Miss	1	Solo
338	339	1.0	3	male	45.00	7598	8.050000	S	Mr	0	Solo
340	341	1.0	2	male	2.00	230080	26.000000	S	Master	1	Small
341	342	1.0	1	female	24.00	19950	263.000000	S	Miss	1	Big
345	346	1.0	2	female	24.00	248733	13.000000	S	Miss	1	Solo
346	347	1.0	2	female	40.00	31418	13.000000	S	Miss	0	Solo
348	349	1.0	3	male	3.00	C.A. 37671	15.900000	S	Master	0	Small
356	357	1.0	1	female	22.00	113505	55.000000	S	Miss	1	Small
366	367	1.0	1	female	60.00	110813	75.250000	C	Mrs	1	Small
369	370	1.0	1	female	24.00	PC 17477	69.300000	C	Miss	1	Solo
370	371	1.0	1	male	25.00	11765	55.441700	C	Mr	1	Small
376	377	1.0	3	female	22.00	C 7077	7.250000	S	Miss	0	Solo
380	381	1.0	1	female	42.00	PC 17757	227.525000	C	Miss	0	Solo
381	382	1.0	3	female	1.00	2653	15.741700	C	Miss	0	Small
383	384	1.0	1	female	35.00	113789	52.000000	S	Mrs	0	Small
387	388	1.0	2	female	36.00	27849	13.000000	S	Miss	0	Solo
389	390	1.0	2	female	17.00	SC 1748	12.000000	C	Miss	0	Solo
390	391	1.0	1	male	36.00	113760	120.000000	S	Mr	1	Small
391	392	1.0	3	male	21.00	350034	7.795800	S	Mr	0	Solo
393	394	1.0	1	female	23.00	35273	113.275000	C	Miss	1	Small
394	395	1.0	3	female	24.00	PP 9549	16.700000	S	Mrs	1	Small
399	400	1.0	2	female	28.00	240929	12.650000	S	Mrs	0	Solo
400	401	1.0	3	male	39.00	STON/O 2. 3101289	7.925000	S	Mr	0	Solo
407	408	1.0	2	male	3.00	29106	18.750000	S	Master	0	Small
412	413	1.0	1	female	33.00	19928	90.000000	Q	Miss	1	Small
414	415	1.0	3	male	44.00	STON/O 2. 3101269	7.925000	S	Mr	0	Solo
416	417	1.0	2	female	34.00	28220	32.500000	S	Mrs	0	Small
417	418	1.0	2	female	18.00	250652	13.000000	S	Miss	0	Small
426	427	1.0	2	female	28.00	2003	26.000000	S	Mrs	0	Small
427	428	1.0	2	female	19.00	250655	26.000000	S	Miss	0	Solo
429	430	1.0	3	male	32.00	SOTON/O.Q. 392078	8.050000	S	Mr	1	Solo
430	431	1.0	1	male	28.00	110564	26.550000	S	Mr	1	Solo
432	433	1.0	2	female	42.00	SC/AH 3085	26.000000	S	Mrs	0	Small
435	436	1.0	1	female	14.00	113760	120.000000	S	Miss	1	Small
437	438	1.0	2	female	24.00	29106	18.750000	S	Mrs	0	Big
440	441	1.0	2	female	45.00	F.C.C. 13529	26.250000	S	Mrs	0	Small
443	444	1.0	2	female	28.00	230434	13.000000	S	Miss	0	Solo
445	446	1.0	1	male	4.00	33638	81.858300	S	Master	1	Small
446	447	1.0	2	female	13.00	250644	19.500000	S	Miss	0	Small
447	448	1.0	1	male	34.00	113794	26.550000	S	Mr	0	Solo
448	449	1.0	3	female	5.00	2666	19.258300	C	Miss	0	Small
449	450	1.0	1	male	52.00	113786	30.500000	S	Mr	1	Solo
453	454	1.0	1	male	49.00	17453	89.104200	C	Mr	1	Small
455	456	1.0	3	male	29.00	349240	7.895800	C	Mr	0	Solo
458	459	1.0	2	female	50.00	F.C.C. 13531	10.500000	S	Miss	0	Solo
460	461	1.0	1	male	48.00	19952	26.550000	S	Mr	1	Solo
469	470	1.0	3	female	0.75	2666	19.258300	C	Miss	0	Small
472	473	1.0	2	female	33.00	C.A. 34651	27.750000	S	Mrs	0	Small
473	474	1.0	2	female	23.00	SC/AH Basle 541	13.791700	C	Mrs	1	Solo
479	480	1.0	3	female	2.00	3101298	12.287500	S	Miss	0	Small
483	484	1.0	3	female	63.00	4134	9.587500	S	Mrs	0	Solo
484	485	1.0	1	male	25.00	11967	91.079200	C	Mr	1	Small
486	487	1.0	1	female	35.00	19943	90.000000	S	Mrs	1	Small
489	490	1.0	3	male	9.00	C.A. 37671	15.900000	S	Master	0	Small
496	497	1.0	1	female	54.00	36947	78.266700	C	Miss	1	Small
504	505	1.0	1	female	16.00	110152	86.500000	S	Miss	1	Solo
506	507	1.0	2	female	33.00	26360	26.000000	S	Mrs	0	Small
509	510	1.0	3	male	26.00	1601	56.495800	S	Mr	0	Solo
510	511	1.0	3	male	29.00	382651	7.750000	Q	Mr	0	Solo
512	513	1.0	1	male	36.00	PC 17473	26.287500	S	Mr	1	Solo
513	514	1.0	1	female	54.00	PC 17603	59.400000	C	Mrs	0	Small
516	517	1.0	2	female	34.00	C.A. 34260	10.500000	S	Mrs	1	Solo
518	519	1.0	2	female	36.00	226875	26.000000	S	Mrs	0	Small
520	521	1.0	1	female	30.00	12749	93.500000	S	Miss	1	Solo
523	524	1.0	1	female	44.00	111361	57.979200	C	Mrs	1	Small
526	527	1.0	2	female	50.00	W./C. 14258	10.500000	S	Miss	0	Solo
530	531	1.0	2	female	2.00	26360	26.000000	S	Miss	0	Small
535	536	1.0	2	female	7.00	F.C.C. 13529	26.250000	S	Miss	0	Small
537	538	1.0	1	female	30.00	PC 17761	106.425000	C	Miss	0	Solo
539	540	1.0	1	female	22.00	13568	49.500000	C	Miss	1	Small
540	541	1.0	1	female	36.00	WE/P 5735	71.000000	S	Miss	1	Small
543	544	1.0	2	male	32.00	2908	26.000000	S	Mr	0	Small
546	547	1.0	2	female	19.00	2908	26.000000	S	Mrs	0	Small
549	550	1.0	2	male	8.00	C.A. 33112	36.750000	S	Master	0	Small
550	551	1.0	1	male	17.00	17421	110.883300	C	Mr	1	Small
553	554	1.0	3	male	22.00	2620	7.225000	C	Mr	0	Solo
554	555	1.0	3	female	22.00	347085	7.775000	S	Miss	0	Solo
556	557	1.0	1	female	48.00	11755	39.600000	C	Miss	1	Small
558	559	1.0	1	female	39.00	110413	79.650000	S	Mrs	1	Small
559	560	1.0	3	female	36.00	345572	17.400000	S	Mrs	0	Small
569	570	1.0	3	male	32.00	350417	7.854200	S	Mr	0	Solo
570	571	1.0	2	male	62.00	S.W./PP 752	10.500000	S	Mr	0	Solo
571	572	1.0	1	female	53.00	11769	51.479200	S	Mrs	1	Small
572	573	1.0	1	male	36.00	PC 17474	26.387500	S	Mr	1	Solo
576	577	1.0	2	female	34.00	243880	13.000000	S	Miss	0	Solo
577	578	1.0	1	female	39.00	13507	55.900000	S	Mrs	1	Small
579	580	1.0	3	male	32.00	STON/O 2. 3101286	7.925000	S	Mr	0	Solo
580	581	1.0	2	female	25.00	237789	30.000000	S	Miss	0	Small
581	582	1.0	1	female	39.00	17421	110.883300	C	Mrs	1	Small
585	586	1.0	1	female	18.00	110413	79.650000	S	Miss	1	Small
587	588	1.0	1	male	60.00	13567	79.200000	C	Mr	1	Small
591	592	1.0	1	female	52.00	36947	78.266700	C	Mrs	1	Small
599	600	1.0	1	male	49.00	PC 17485	56.929200	C	Mr	1	Small
600	601	1.0	2	female	24.00	243847	27.000000	S	Mrs	0	Small
604	605	1.0	1	male	35.00	111426	26.550000	C	Mr	0	Solo
607	608	1.0	1	male	27.00	113804	30.500000	S	Mr	0	Solo
608	609	1.0	2	female	22.00	SC/Paris 2123	41.579200	C	Mrs	0	Small
609	610	1.0	1	female	40.00	PC 17582	153.462500	S	Miss	1	Solo
615	616	1.0	2	female	24.00	220845	65.000000	S	Miss	0	Small
618	619	1.0	2	female	4.00	230136	39.000000	S	Miss	1	Small
621	622	1.0	1	male	42.00	11753	52.554200	S	Mr	1	Small
622	623	1.0	3	male	20.00	2653	15.741700	C	Mr	0	Small
627	628	1.0	1	female	21.00	13502	77.958300	S	Miss	1	Solo
630	631	1.0	1	male	80.00	27042	30.000000	S	Mr	1	Solo
632	633	1.0	1	male	32.00	13214	30.500000	C	Dr	1	Solo
635	636	1.0	2	female	28.00	237668	13.000000	S	Miss	0	Solo
641	642	1.0	1	female	24.00	PC 17477	69.300000	C	Miss	1	Solo
644	645	1.0	3	female	0.75	2666	19.258300	C	Miss	0	Small
645	646	1.0	1	male	48.00	PC 17572	76.729200	C	Mr	1	Small
647	648	1.0	1	male	56.00	13213	35.500000	C	Mr	1	Solo
649	650	1.0	3	female	23.00	CA. 2314	7.550000	S	Miss	0	Solo
651	652	1.0	2	female	18.00	231919	23.000000	S	Miss	0	Small
660	661	1.0	1	male	50.00	PC 17611	133.650000	S	Dr	0	Small
664	665	1.0	3	male	20.00	STON/O 2. 3101285	7.925000	S	Mr	0	Small
670	671	1.0	2	female	40.00	29750	39.000000	S	Mrs	0	Small
673	674	1.0	2	male	31.00	244270	13.000000	S	Mr	0	Solo
677	678	1.0	3	female	18.00	4138	9.841700	S	Miss	0	Solo
679	680	1.0	1	male	36.00	PC 17755	512.329200	C	Mr	1	Small
681	682	1.0	1	male	27.00	PC 17572	76.729200	C	Mr	1	Solo
689	690	1.0	1	female	15.00	24160	211.337500	S	Miss	1	Small
690	691	1.0	1	male	31.00	17474	57.000000	S	Mr	1	Small
691	692	1.0	3	female	4.00	349256	13.416700	C	Miss	0	Small
700	701	1.0	1	female	18.00	PC 17757	227.525000	C	Mrs	1	Small
701	702	1.0	1	male	35.00	PC 17475	26.287500	S	Mr	1	Solo
706	707	1.0	2	female	45.00	223596	13.500000	S	Mrs	0	Solo
707	708	1.0	1	male	42.00	PC 17476	26.287500	S	Mr	1	Solo
708	709	1.0	1	female	22.00	113781	151.550000	S	Miss	0	Solo
710	711	1.0	1	female	24.00	PC 17482	49.504200	C	Miss	1	Solo
712	713	1.0	1	male	48.00	19996	52.000000	S	Mr	1	Small
716	717	1.0	1	female	38.00	PC 17757	227.525000	C	Miss	1	Solo
717	718	1.0	2	female	27.00	34218	10.500000	S	Miss	1	Solo
720	721	1.0	2	female	6.00	248727	33.000000	S	Miss	0	Small
724	725	1.0	1	male	27.00	113806	53.100000	S	Mr	1	Small
726	727	1.0	2	female	30.00	31027	21.000000	S	Mrs	0	Small
730	731	1.0	1	female	29.00	24160	211.337500	S	Miss	1	Solo
737	738	1.0	1	male	35.00	PC 17755	512.329200	C	Mr	1	Solo
742	743	1.0	1	female	21.00	PC 17608	262.375000	C	Miss	1	Big
744	745	1.0	3	male	31.00	STON/O 2. 3101288	7.925000	S	Mr	0	Solo
747	748	1.0	2	female	30.00	250648	13.000000	S	Miss	0	Solo
750	751	1.0	2	female	4.00	29103	23.000000	S	Miss	0	Small
751	752	1.0	3	male	6.00	392096	12.475000	S	Master	1	Small
754	755	1.0	2	female	48.00	220845	65.000000	S	Mrs	0	Small
755	756	1.0	2	male	0.67	250649	14.500000	S	Master	0	Small
759	760	1.0	1	female	33.00	110152	86.500000	S	Miss	1	Solo
762	763	1.0	3	male	20.00	2663	7.229200	C	Mr	0	Solo
763	764	1.0	1	female	36.00	113760	120.000000	S	Mrs	1	Small
765	766	1.0	1	female	51.00	13502	77.958300	S	Mrs	1	Small
774	775	1.0	2	female	54.00	29105	23.000000	S	Mrs	0	Big
777	778	1.0	3	female	5.00	364516	12.475000	S	Miss	0	Solo
779	780	1.0	1	female	43.00	24160	211.337500	S	Mrs	1	Small
780	781	1.0	3	female	13.00	2687	7.229200	C	Miss	0	Solo
781	782	1.0	1	female	17.00	17474	57.000000	S	Mrs	1	Small
786	787	1.0	3	female	18.00	3101265	7.495800	S	Miss	0	Solo
788	789	1.0	3	male	1.00	C.A. 2315	20.575000	S	Master	0	Small
796	797	1.0	1	female	49.00	17465	25.929200	S	Dr	1	Solo
797	798	1.0	3	female	31.00	349244	8.683300	S	Mrs	0	Solo
801	802	1.0	2	female	31.00	C.A. 31921	26.250000	S	Mrs	0	Small
802	803	1.0	1	male	11.00	113760	120.000000	S	Master	1	Small
803	804	1.0	3	male	0.42	2625	8.516700	C	Master	0	Small
804	805	1.0	3	male	27.00	347089	6.975000	S	Mr	0	Solo
809	810	1.0	1	female	33.00	113806	53.100000	S	Mrs	1	Small
820	821	1.0	1	female	52.00	12749	93.500000	S	Mrs	1	Small
821	822	1.0	3	male	27.00	315098	8.662500	S	Mr	0	Solo
823	824	1.0	3	female	27.00	392096	12.475000	S	Mrs	1	Small
827	828	1.0	2	male	1.00	S.C./PARIS 2079	37.004200	C	Master	0	Small
829	830	1.0	1	female	62.00	113572	80.000000	S	Mrs	1	Solo
830	831	1.0	3	female	15.00	2659	14.454200	C	Mrs	0	Small
831	832	1.0	2	male	0.83	29106	18.750000	S	Master	0	Small
835	836	1.0	1	female	39.00	PC 17756	83.158300	C	Miss	1	Small
838	839	1.0	3	male	32.00	1601	56.495800	S	Mr	0	Solo
842	843	1.0	1	female	30.00	113798	31.000000	C	Miss	0	Solo
853	854	1.0	1	female	16.00	PC 17592	39.400000	S	Miss	1	Small
855	856	1.0	3	female	18.00	392091	9.350000	S	Mrs	0	Small
856	857	1.0	1	female	45.00	36928	164.866700	S	Mrs	0	Small
857	858	1.0	1	male	51.00	113055	26.550000	S	Mr	1	Solo
858	859	1.0	3	female	24.00	2666	19.258300	C	Mrs	0	Small
862	863	1.0	1	female	48.00	17466	25.929200	S	Mrs	1	Solo
865	866	1.0	2	female	42.00	236852	13.000000	S	Mrs	0	Solo
866	867	1.0	2	female	27.00	SC/PARIS 2149	13.858300	C	Miss	0	Small
869	870	1.0	3	male	4.00	347742	11.133300	S	Master	0	Small
871	872	1.0	1	female	47.00	11751	52.554200	S	Mrs	1	Small
874	875	1.0	2	female	28.00	P/PP 3381	24.000000	C	Mrs	0	Small
875	876	1.0	3	female	15.00	2667	7.225000	C	Miss	0	Solo
879	880	1.0	1	female	56.00	11767	83.158300	C	Mrs	1	Small
880	881	1.0	2	female	25.00	230433	26.000000	S	Mrs	0	Small
887	888	1.0	1	female	19.00	112053	30.000000	S	Miss	1	Solo
889	890	1.0	1	male	26.00	111369	30.000000	C	Mr	1	Solo
total_df.query('Age.notnull() and Survived==0')
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
0	1	0.0	3	male	22.0	A/5 21171	7.250000	S	Mr	0	Small
4	5	0.0	3	male	35.0	373450	8.050000	S	Mr	0	Solo
6	7	0.0	1	male	54.0	17463	51.862500	S	Mr	1	Solo
7	8	0.0	3	male	2.0	349909	21.075000	S	Master	0	Big
12	13	0.0	3	male	20.0	A/5. 2151	8.050000	S	Mr	0	Solo
13	14	0.0	3	male	39.0	347082	31.275000	S	Mr	0	Big
14	15	0.0	3	female	14.0	350406	7.854200	S	Miss	0	Solo
16	17	0.0	3	male	2.0	382652	29.125000	Q	Master	0	Big
18	19	0.0	3	female	31.0	345763	18.000000	S	Mrs	0	Small
20	21	0.0	2	male	35.0	239865	26.000000	S	Mr	0	Solo
24	25	0.0	3	female	8.0	349909	21.075000	S	Miss	0	Big
27	28	0.0	1	male	19.0	19950	263.000000	S	Mr	1	Big
30	31	0.0	1	male	40.0	PC 17601	27.720800	C	Mr	0	Solo
33	34	0.0	2	male	66.0	C.A. 24579	10.500000	S	Mr	0	Solo
34	35	0.0	1	male	28.0	PC 17604	82.170800	C	Mr	0	Small
35	36	0.0	1	male	42.0	113789	52.000000	S	Mr	0	Small
37	38	0.0	3	male	21.0	A./5. 2152	8.050000	S	Mr	0	Solo
38	39	0.0	3	female	18.0	345764	18.000000	S	Miss	0	Small
40	41	0.0	3	female	40.0	7546	9.475000	S	Mrs	0	Small
41	42	0.0	2	female	27.0	11668	21.000000	S	Mrs	0	Small
49	50	0.0	3	female	18.0	349237	17.800000	S	Mrs	0	Small
50	51	0.0	3	male	7.0	3101295	39.687500	S	Master	0	Big
51	52	0.0	3	male	21.0	A/4. 39886	7.800000	S	Mr	0	Solo
54	55	0.0	1	male	65.0	113509	61.979200	C	Mr	1	Small
57	58	0.0	3	male	28.5	2697	7.229200	C	Mr	0	Solo
59	60	0.0	3	male	11.0	CA 2144	46.900000	S	Master	0	Very big
60	61	0.0	3	male	22.0	2669	7.229200	C	Mr	0	Solo
62	63	0.0	1	male	45.0	36973	83.475000	S	Mr	1	Small
63	64	0.0	3	male	4.0	347088	27.900000	S	Master	0	Big
67	68	0.0	3	male	19.0	S.P. 3464	8.158300	S	Mr	0	Solo
69	70	0.0	3	male	26.0	315151	8.662500	S	Mr	0	Small
70	71	0.0	2	male	32.0	C.A. 33111	10.500000	S	Mr	0	Solo
71	72	0.0	3	female	16.0	CA 2144	46.900000	S	Miss	0	Very big
72	73	0.0	2	male	21.0	S.O.C. 14879	73.500000	S	Mr	0	Solo
73	74	0.0	3	male	26.0	2680	14.454200	C	Mr	0	Small
75	76	0.0	3	male	25.0	348123	7.650000	S	Mr	1	Solo
80	81	0.0	3	male	22.0	345767	9.000000	S	Mr	0	Solo
83	84	0.0	1	male	28.0	113059	47.100000	S	Mr	0	Solo
86	87	0.0	3	male	16.0	W./C. 6608	34.375000	S	Mr	0	Big
89	90	0.0	3	male	24.0	343275	8.050000	S	Mr	0	Solo
90	91	0.0	3	male	29.0	343276	8.050000	S	Mr	0	Solo
91	92	0.0	3	male	20.0	347466	7.854200	S	Mr	0	Solo
92	93	0.0	1	male	46.0	W.E.P. 5734	61.175000	S	Mr	1	Small
93	94	0.0	3	male	26.0	C.A. 2315	20.575000	S	Mr	0	Small
94	95	0.0	3	male	59.0	364500	7.250000	S	Mr	0	Solo
96	97	0.0	1	male	71.0	PC 17754	34.654200	C	Mr	1	Solo
99	100	0.0	2	male	34.0	244367	26.000000	S	Mr	0	Small
100	101	0.0	3	female	28.0	349245	7.895800	S	Miss	0	Solo
102	103	0.0	1	male	21.0	35281	77.287500	S	Mr	1	Small
103	104	0.0	3	male	33.0	7540	8.654200	S	Mr	0	Solo
104	105	0.0	3	male	37.0	3101276	7.925000	S	Mr	0	Small
105	106	0.0	3	male	28.0	349207	7.895800	S	Mr	0	Solo
108	109	0.0	3	male	38.0	349249	7.895800	S	Mr	0	Solo
110	111	0.0	1	male	47.0	110465	52.000000	S	Mr	1	Solo
111	112	0.0	3	female	14.5	2665	14.454200	C	Miss	0	Small
112	113	0.0	3	male	22.0	324669	8.050000	S	Mr	0	Solo
113	114	0.0	3	female	20.0	4136	9.825000	S	Miss	0	Small
114	115	0.0	3	female	17.0	2627	14.458300	C	Miss	0	Solo
115	116	0.0	3	male	21.0	STON/O 2. 3101294	7.925000	S	Mr	0	Solo
116	117	0.0	3	male	70.5	370369	7.750000	Q	Mr	0	Solo
117	118	0.0	2	male	29.0	11668	21.000000	S	Mr	0	Small
118	119	0.0	1	male	24.0	PC 17558	247.520800	C	Mr	1	Small
119	120	0.0	3	female	2.0	347082	31.275000	S	Miss	0	Big
120	121	0.0	2	male	21.0	S.O.C. 14879	73.500000	S	Mr	0	Small
122	123	0.0	2	male	32.5	237736	30.070800	C	Mr	0	Small
124	125	0.0	1	male	54.0	35281	77.287500	S	Mr	1	Small
129	130	0.0	3	male	45.0	347061	6.975000	S	Mr	0	Solo
130	131	0.0	3	male	33.0	349241	7.895800	C	Mr	0	Solo
131	132	0.0	3	male	20.0	SOTON/O.Q. 3101307	7.050000	S	Mr	0	Solo
132	133	0.0	3	female	47.0	A/5. 3337	14.500000	S	Mrs	0	Small
134	135	0.0	2	male	25.0	C.A. 29178	13.000000	S	Mr	0	Solo
135	136	0.0	2	male	23.0	SC/PARIS 2133	15.045800	C	Mr	0	Solo
137	138	0.0	1	male	37.0	113803	53.100000	S	Mr	1	Small
138	139	0.0	3	male	16.0	7534	9.216700	S	Mr	0	Solo
139	140	0.0	1	male	24.0	PC 17593	79.200000	C	Mr	1	Solo
143	144	0.0	3	male	19.0	365222	6.750000	Q	Mr	0	Solo
144	145	0.0	2	male	18.0	231945	11.500000	S	Mr	0	Solo
145	146	0.0	2	male	19.0	C.A. 33112	36.750000	S	Mr	0	Small
147	148	0.0	3	female	9.0	W./C. 6608	34.375000	S	Miss	0	Big
148	149	0.0	2	male	36.5	230080	26.000000	S	Mr	1	Small
149	150	0.0	2	male	42.0	244310	13.000000	S	Rev	0	Solo
150	151	0.0	2	male	51.0	S.O.P. 1166	12.525000	S	Rev	0	Solo
152	153	0.0	3	male	55.5	A.5. 11206	8.050000	S	Mr	0	Solo
153	154	0.0	3	male	40.5	A/5. 851	14.500000	S	Mr	0	Small
155	156	0.0	1	male	51.0	PC 17597	61.379200	C	Mr	0	Small
157	158	0.0	3	male	30.0	SOTON/OQ 392090	8.050000	S	Mr	0	Solo
160	161	0.0	3	male	44.0	371362	16.100000	S	Mr	0	Small
162	163	0.0	3	male	26.0	347068	7.775000	S	Mr	0	Solo
163	164	0.0	3	male	17.0	315093	8.662500	S	Mr	0	Solo
164	165	0.0	3	male	1.0	3101295	39.687500	S	Master	0	Big
167	168	0.0	3	female	45.0	347088	27.900000	S	Mrs	0	Big
169	170	0.0	3	male	28.0	1601	56.495800	S	Mr	0	Solo
170	171	0.0	1	male	61.0	111240	33.500000	S	Mr	1	Solo
171	172	0.0	3	male	4.0	382652	29.125000	Q	Master	0	Big
173	174	0.0	3	male	21.0	STON/O 2. 3101280	7.925000	S	Mr	0	Solo
174	175	0.0	1	male	56.0	17764	30.695800	C	Mr	1	Solo
175	176	0.0	3	male	18.0	350404	7.854200	S	Mr	0	Small
177	178	0.0	1	female	50.0	PC 17595	28.712500	C	Miss	1	Solo
178	179	0.0	2	male	30.0	250653	13.000000	S	Mr	0	Solo
179	180	0.0	3	male	36.0	LINE	19.719431	S	Mr	0	Solo
182	183	0.0	3	male	9.0	347077	31.387500	S	Master	0	Big
188	189	0.0	3	male	40.0	364849	15.500000	Q	Mr	0	Small
189	190	0.0	3	male	36.0	349247	7.895800	S	Mr	0	Solo
191	192	0.0	2	male	19.0	28424	13.000000	S	Mr	0	Solo
197	198	0.0	3	male	42.0	4579	8.404200	S	Mr	0	Small
199	200	0.0	2	female	24.0	248747	13.000000	S	Miss	0	Solo
200	201	0.0	3	male	28.0	345770	9.500000	S	Mr	0	Solo
202	203	0.0	3	male	34.0	3101264	6.495800	S	Mr	0	Solo
203	204	0.0	3	male	45.5	2628	7.225000	C	Mr	0	Solo
205	206	0.0	3	female	2.0	347054	10.462500	S	Miss	1	Small
206	207	0.0	3	male	32.0	3101278	15.850000	S	Mr	0	Small
210	211	0.0	3	male	24.0	SOTON/O.Q. 3101311	7.050000	S	Mr	0	Solo
212	213	0.0	3	male	22.0	A/5 21174	7.250000	S	Mr	0	Solo
213	214	0.0	2	male	30.0	250646	13.000000	S	Mr	0	Solo
217	218	0.0	2	male	42.0	243847	27.000000	S	Mr	0	Small
219	220	0.0	2	male	30.0	W/C 14208	10.500000	S	Mr	0	Solo
221	222	0.0	2	male	27.0	220367	13.000000	S	Mr	0	Solo
222	223	0.0	3	male	51.0	21440	8.050000	S	Mr	0	Solo
225	226	0.0	3	male	22.0	PP 4348	9.350000	S	Mr	0	Solo
227	228	0.0	3	male	20.5	A/5 21173	7.250000	S	Mr	0	Solo
228	229	0.0	2	male	18.0	236171	13.000000	S	Mr	0	Solo
231	232	0.0	3	male	29.0	347067	7.775000	S	Mr	0	Solo
232	233	0.0	2	male	59.0	237442	13.500000	S	Mr	0	Solo
234	235	0.0	2	male	24.0	C.A. 29566	10.500000	S	Mr	0	Solo
236	237	0.0	2	male	44.0	26707	26.000000	S	Mr	0	Small
238	239	0.0	2	male	19.0	28665	10.500000	S	Mr	0	Solo
239	240	0.0	2	male	33.0	SCO/W 1585	12.275000	S	Mr	0	Solo
242	243	0.0	2	male	29.0	W./C. 14263	10.500000	S	Mr	0	Solo
243	244	0.0	3	male	22.0	STON/O 2. 3101275	7.125000	S	Mr	0	Solo
244	245	0.0	3	male	30.0	2694	7.225000	C	Mr	0	Solo
245	246	0.0	1	male	44.0	19928	90.000000	Q	Dr	1	Small
246	247	0.0	3	female	25.0	347071	7.775000	S	Miss	0	Solo
249	250	0.0	2	male	54.0	244252	26.000000	S	Rev	0	Small
251	252	0.0	3	female	29.0	347054	10.462500	S	Mrs	1	Small
252	253	0.0	1	male	62.0	113514	26.550000	S	Mr	1	Solo
253	254	0.0	3	male	30.0	A/5. 3336	16.100000	S	Mr	0	Small
254	255	0.0	3	female	41.0	370129	20.212500	S	Mrs	0	Small
262	263	0.0	1	male	52.0	110413	79.650000	S	Mr	1	Small
263	264	0.0	1	male	40.0	112059	66.747620	S	Mr	1	Solo
265	266	0.0	2	male	36.0	C.A. 17248	10.500000	S	Mr	0	Solo
266	267	0.0	3	male	16.0	3101295	39.687500	S	Mr	0	Big
273	274	0.0	1	male	37.0	PC 17596	29.700000	C	Mr	1	Small
276	277	0.0	3	female	45.0	347073	7.750000	S	Miss	0	Solo
278	279	0.0	3	male	7.0	382652	29.125000	Q	Master	0	Big
280	281	0.0	3	male	65.0	336439	7.750000	Q	Mr	0	Solo
281	282	0.0	3	male	28.0	347464	7.854200	S	Mr	0	Solo
282	283	0.0	3	male	16.0	345778	9.500000	S	Mr	0	Solo
285	286	0.0	3	male	33.0	349239	8.662500	C	Mr	0	Solo
287	288	0.0	3	male	22.0	349206	7.895800	S	Mr	0	Solo
292	293	0.0	2	male	36.0	SC/Paris 2163	12.875000	C	Mr	1	Solo
293	294	0.0	3	female	24.0	349236	8.850000	S	Miss	0	Solo
294	295	0.0	3	male	24.0	349233	7.895800	S	Mr	0	Solo
296	297	0.0	3	male	23.5	2693	7.229200	C	Mr	0	Solo
297	298	0.0	1	female	2.0	113781	151.550000	S	Miss	1	Small
302	303	0.0	3	male	19.0	LINE	19.719431	S	Mr	0	Solo
308	309	0.0	2	male	30.0	P/PP 3381	24.000000	C	Mr	0	Small
312	313	0.0	2	female	26.0	250651	26.000000	S	Mrs	0	Small
313	314	0.0	3	male	28.0	349243	7.895800	S	Mr	0	Solo
314	315	0.0	2	male	43.0	F.C.C. 13529	26.250000	S	Mr	0	Small
317	318	0.0	2	male	54.0	29011	14.000000	S	Dr	0	Solo
320	321	0.0	3	male	22.0	A/5 21172	7.250000	S	Mr	0	Solo
321	322	0.0	3	male	27.0	349219	7.895800	S	Mr	0	Solo
326	327	0.0	3	male	61.0	345364	6.237500	S	Mr	0	Solo
331	332	0.0	1	male	45.5	113043	28.500000	S	Mr	1	Solo
332	333	0.0	1	male	38.0	PC 17582	153.462500	S	Mr	1	Small
333	334	0.0	3	male	16.0	345764	18.000000	S	Mr	0	Small
336	337	0.0	1	male	29.0	113776	66.600000	S	Mr	1	Small
339	340	0.0	1	male	45.0	113784	35.500000	S	Mr	1	Solo
342	343	0.0	2	male	28.0	248740	13.000000	S	Mr	0	Solo
343	344	0.0	2	male	25.0	244361	13.000000	S	Mr	0	Solo
344	345	0.0	2	male	36.0	229236	13.000000	S	Mr	0	Solo
349	350	0.0	3	male	42.0	315088	8.662500	S	Mr	0	Solo
350	351	0.0	3	male	23.0	7267	9.225000	S	Mr	0	Solo
352	353	0.0	3	male	15.0	2695	7.229200	C	Mr	0	Small
353	354	0.0	3	male	25.0	349237	17.800000	S	Mr	0	Small
355	356	0.0	3	male	28.0	345783	9.500000	S	Mr	0	Solo
357	358	0.0	2	female	38.0	237671	13.000000	S	Miss	0	Solo
360	361	0.0	3	male	40.0	347088	27.900000	S	Mr	0	Big
361	362	0.0	2	male	29.0	SC/PARIS 2167	27.720800	C	Mr	0	Small
362	363	0.0	3	female	45.0	2691	14.454200	C	Mrs	0	Small
363	364	0.0	3	male	35.0	SOTON/O.Q. 3101310	7.050000	S	Mr	0	Solo
365	366	0.0	3	male	30.0	C 7076	7.250000	S	Mr	0	Solo
371	372	0.0	3	male	18.0	3101267	6.495800	S	Mr	0	Small
372	373	0.0	3	male	19.0	323951	8.050000	S	Mr	0	Solo
373	374	0.0	1	male	22.0	PC 17760	135.633300	C	Mr	0	Solo
374	375	0.0	3	female	3.0	349909	21.075000	S	Miss	0	Big
377	378	0.0	1	male	27.0	113503	211.500000	C	Mr	1	Small
378	379	0.0	3	male	20.0	2648	4.012500	C	Mr	0	Solo
379	380	0.0	3	male	19.0	347069	7.775000	S	Mr	0	Solo
382	383	0.0	3	male	32.0	STON/O 2. 3101293	7.925000	S	Mr	0	Solo
385	386	0.0	2	male	18.0	S.O.C. 14879	73.500000	S	Mr	0	Solo
386	387	0.0	3	male	1.0	CA 2144	46.900000	S	Master	0	Very big
392	393	0.0	3	male	28.0	3101277	7.925000	S	Mr	0	Small
395	396	0.0	3	male	22.0	350052	7.795800	S	Mr	0	Solo
396	397	0.0	3	female	31.0	350407	7.854200	S	Miss	0	Solo
397	398	0.0	2	male	46.0	28403	26.000000	S	Mr	0	Solo
398	399	0.0	2	male	23.0	244278	10.500000	S	Dr	0	Solo
401	402	0.0	3	male	26.0	341826	8.050000	S	Mr	0	Solo
402	403	0.0	3	female	21.0	4137	9.825000	S	Miss	0	Small
403	404	0.0	3	male	28.0	STON/O2. 3101279	15.850000	S	Mr	0	Small
404	405	0.0	3	female	20.0	315096	8.662500	S	Miss	0	Solo
405	406	0.0	2	male	34.0	28664	21.000000	S	Mr	0	Small
406	407	0.0	3	male	51.0	347064	7.750000	S	Mr	0	Solo
408	409	0.0	3	male	21.0	312992	7.775000	S	Mr	0	Solo
418	419	0.0	2	male	30.0	28228	13.000000	S	Mr	0	Solo
419	420	0.0	3	female	10.0	345773	24.150000	S	Miss	0	Small
421	422	0.0	3	male	21.0	A/5. 13032	7.733300	Q	Mr	0	Solo
422	423	0.0	3	male	29.0	315082	7.875000	S	Mr	0	Solo
423	424	0.0	3	female	28.0	347080	14.400000	S	Mrs	0	Small
424	425	0.0	3	male	18.0	370129	20.212500	S	Mr	0	Small
433	434	0.0	3	male	17.0	STON/O 2. 3101274	7.125000	S	Mr	0	Solo
434	435	0.0	1	male	50.0	13507	55.900000	S	Mr	1	Small
436	437	0.0	3	female	21.0	W./C. 6608	34.375000	S	Miss	0	Big
438	439	0.0	1	male	64.0	19950	263.000000	S	Mr	1	Big
439	440	0.0	2	male	31.0	C.A. 18723	10.500000	S	Mr	0	Solo
441	442	0.0	3	male	20.0	345769	9.500000	S	Mr	0	Solo
442	443	0.0	3	male	25.0	347076	7.775000	S	Mr	0	Small
450	451	0.0	2	male	36.0	C.A. 34651	27.750000	S	Mr	0	Small
452	453	0.0	1	male	30.0	113051	27.750000	C	Mr	1	Solo
456	457	0.0	1	male	65.0	13509	26.550000	S	Mr	1	Solo
461	462	0.0	3	male	34.0	364506	8.050000	S	Mr	0	Solo
462	463	0.0	1	male	47.0	111320	38.500000	S	Mr	1	Solo
463	464	0.0	2	male	48.0	234360	13.000000	S	Mr	0	Solo
465	466	0.0	3	male	38.0	SOTON/O.Q. 3101306	7.050000	S	Mr	0	Solo
467	468	0.0	1	male	56.0	113792	26.550000	S	Mr	0	Solo
471	472	0.0	3	male	38.0	315089	8.662500	S	Mr	0	Solo
474	475	0.0	3	female	22.0	7553	9.837500	S	Miss	0	Solo
476	477	0.0	2	male	34.0	31027	21.000000	S	Mr	0	Small
477	478	0.0	3	male	29.0	3460	7.045800	S	Mr	0	Small
478	479	0.0	3	male	22.0	350060	7.520800	S	Mr	0	Solo
480	481	0.0	3	male	9.0	CA 2144	46.900000	S	Master	0	Very big
482	483	0.0	3	male	50.0	A/5 3594	8.050000	S	Mr	0	Solo
487	488	0.0	1	male	58.0	11771	29.700000	C	Mr	1	Solo
488	489	0.0	3	male	30.0	A.5. 18509	8.050000	S	Mr	0	Solo
491	492	0.0	3	male	21.0	SOTON/OQ 3101317	7.250000	S	Mr	0	Solo
492	493	0.0	1	male	55.0	113787	30.500000	S	Mr	1	Solo
493	494	0.0	1	male	71.0	PC 17609	49.504200	C	Mr	0	Solo
494	495	0.0	3	male	21.0	A/4 45380	8.050000	S	Mr	0	Solo
498	499	0.0	1	female	25.0	113781	151.550000	S	Mrs	1	Small
499	500	0.0	3	male	24.0	350035	7.795800	S	Mr	0	Solo
500	501	0.0	3	male	17.0	315086	8.662500	S	Mr	0	Solo
501	502	0.0	3	female	21.0	364846	7.750000	Q	Miss	0	Solo
503	504	0.0	3	female	37.0	4135	9.587500	S	Miss	0	Solo
505	506	0.0	1	male	18.0	PC 17758	108.900000	C	Mr	1	Small
508	509	0.0	3	male	28.0	C 4001	22.525000	S	Mr	0	Solo
514	515	0.0	3	male	24.0	349209	7.495800	S	Mr	0	Solo
515	516	0.0	1	male	47.0	36967	34.020800	S	Mr	1	Solo
519	520	0.0	3	male	32.0	349242	7.895800	S	Mr	0	Solo
521	522	0.0	3	male	22.0	349252	7.895800	S	Mr	0	Solo
525	526	0.0	3	male	40.5	367232	7.750000	Q	Mr	0	Solo
528	529	0.0	3	male	39.0	3101296	7.925000	S	Mr	0	Solo
529	530	0.0	2	male	23.0	29104	11.500000	S	Mr	0	Small
532	533	0.0	3	male	17.0	2690	7.229200	C	Mr	0	Small
534	535	0.0	3	female	30.0	315084	8.662500	S	Miss	0	Solo
536	537	0.0	1	male	45.0	113050	26.550000	S	Mr	1	Solo
541	542	0.0	3	female	9.0	347082	31.275000	S	Miss	0	Big
542	543	0.0	3	female	11.0	347082	31.275000	S	Miss	0	Big
544	545	0.0	1	male	50.0	PC 17761	106.425000	C	Mr	1	Small
545	546	0.0	1	male	64.0	693	26.000000	S	Mr	0	Solo
548	549	0.0	3	male	33.0	363291	20.525000	S	Mr	0	Small
551	552	0.0	2	male	27.0	244358	26.000000	S	Mr	0	Solo
555	556	0.0	1	male	62.0	113807	26.550000	S	Mr	0	Solo
561	562	0.0	3	male	40.0	349251	7.895800	S	Mr	0	Solo
562	563	0.0	2	male	28.0	218629	13.500000	S	Mr	0	Solo
565	566	0.0	3	male	24.0	A/4 48871	24.150000	S	Mr	0	Small
566	567	0.0	3	male	19.0	349205	7.895800	S	Mr	0	Solo
567	568	0.0	3	female	29.0	349909	21.075000	S	Mrs	0	Big
574	575	0.0	3	male	16.0	A/4. 20589	8.050000	S	Mr	0	Solo
575	576	0.0	3	male	19.0	358585	14.500000	S	Mr	0	Solo
582	583	0.0	2	male	54.0	28403	26.000000	S	Mr	0	Solo
583	584	0.0	1	male	36.0	13049	40.125000	C	Mr	1	Solo
586	587	0.0	2	male	47.0	237565	15.000000	S	Mr	0	Solo
588	589	0.0	3	male	22.0	14973	8.050000	S	Mr	0	Solo
590	591	0.0	3	male	35.0	STON/O 2. 3101273	7.125000	S	Mr	0	Solo
592	593	0.0	3	male	47.0	A/5 3902	7.250000	S	Mr	0	Solo
594	595	0.0	2	male	37.0	SC/AH 29037	26.000000	S	Mr	0	Small
595	596	0.0	3	male	36.0	345773	24.150000	S	Mr	0	Small
597	598	0.0	3	male	49.0	LINE	19.719431	S	Mr	0	Solo
603	604	0.0	3	male	44.0	364511	8.050000	S	Mr	0	Solo
605	606	0.0	3	male	36.0	349910	15.550000	S	Mr	0	Small
606	607	0.0	3	male	30.0	349246	7.895800	S	Mr	0	Solo
610	611	0.0	3	female	39.0	347082	31.275000	S	Mrs	0	Big
614	615	0.0	3	male	35.0	364512	8.050000	S	Mr	0	Solo
616	617	0.0	3	male	34.0	347080	14.400000	S	Mr	0	Small
617	618	0.0	3	female	26.0	A/5. 3336	16.100000	S	Mrs	0	Small
619	620	0.0	2	male	26.0	31028	10.500000	S	Mr	0	Solo
620	621	0.0	3	male	27.0	2659	14.454200	C	Mr	0	Small
623	624	0.0	3	male	21.0	350029	7.854200	S	Mr	0	Solo
624	625	0.0	3	male	21.0	54636	16.100000	S	Mr	0	Solo
625	626	0.0	1	male	61.0	36963	32.320800	S	Mr	1	Solo
626	627	0.0	2	male	57.0	219533	12.350000	Q	Rev	0	Solo
628	629	0.0	3	male	26.0	349224	7.895800	S	Mr	0	Solo
631	632	0.0	3	male	51.0	347743	7.054200	S	Mr	0	Solo
634	635	0.0	3	female	9.0	347088	27.900000	S	Miss	0	Big
636	637	0.0	3	male	32.0	STON/O 2. 3101292	7.925000	S	Mr	0	Solo
637	638	0.0	2	male	31.0	C.A. 31921	26.250000	S	Mr	0	Small
638	639	0.0	3	female	41.0	3101295	39.687500	S	Mrs	0	Big
640	641	0.0	3	male	20.0	350050	7.854200	S	Mr	0	Solo
642	643	0.0	3	female	2.0	347088	27.900000	S	Miss	0	Big
646	647	0.0	3	male	19.0	349231	7.895800	S	Mr	0	Solo
652	653	0.0	3	male	21.0	8475	8.433300	S	Mr	0	Solo
654	655	0.0	3	female	18.0	365226	6.750000	Q	Miss	0	Solo
655	656	0.0	2	male	24.0	S.O.C. 14879	73.500000	S	Mr	0	Small
657	658	0.0	3	female	32.0	364849	15.500000	Q	Mrs	0	Small
658	659	0.0	2	male	23.0	29751	13.000000	S	Mr	0	Solo
659	660	0.0	1	male	58.0	35273	113.275000	C	Mr	1	Small
661	662	0.0	3	male	40.0	2623	7.225000	C	Mr	0	Solo
662	663	0.0	1	male	47.0	5727	25.587500	S	Mr	1	Solo
663	664	0.0	3	male	36.0	349210	7.495800	S	Mr	0	Solo
665	666	0.0	2	male	32.0	S.O.C. 14879	73.500000	S	Mr	0	Small
666	667	0.0	2	male	25.0	234686	13.000000	S	Mr	0	Solo
668	669	0.0	3	male	43.0	A/5 3536	8.050000	S	Mr	0	Solo
671	672	0.0	1	male	31.0	F.C. 12750	52.000000	S	Mr	1	Small
672	673	0.0	2	male	70.0	C.A. 24580	10.500000	S	Mr	0	Solo
675	676	0.0	3	male	18.0	349912	7.775000	S	Mr	0	Solo
676	677	0.0	3	male	24.5	342826	8.050000	S	Mr	0	Solo
678	679	0.0	3	female	43.0	CA 2144	46.900000	S	Mrs	0	Very big
682	683	0.0	3	male	20.0	6563	9.225000	S	Mr	0	Solo
683	684	0.0	3	male	14.0	CA 2144	46.900000	S	Mr	0	Very big
684	685	0.0	2	male	60.0	29750	39.000000	S	Mr	0	Small
685	686	0.0	2	male	25.0	SC/Paris 2123	41.579200	C	Mr	0	Small
686	687	0.0	3	male	14.0	3101295	39.687500	S	Mr	0	Big
687	688	0.0	3	male	19.0	349228	10.170800	S	Mr	0	Solo
688	689	0.0	3	male	18.0	350036	7.795800	S	Mr	0	Solo
693	694	0.0	3	male	25.0	2672	7.225000	C	Mr	0	Solo
694	695	0.0	1	male	60.0	113800	26.550000	S	Mr	0	Solo
695	696	0.0	2	male	52.0	248731	13.500000	S	Mr	0	Solo
696	697	0.0	3	male	44.0	363592	8.050000	S	Mr	0	Solo
698	699	0.0	1	male	49.0	17421	110.883300	C	Mr	1	Small
699	700	0.0	3	male	42.0	348121	7.650000	S	Mr	1	Solo
702	703	0.0	3	female	18.0	2691	14.454200	C	Miss	0	Small
703	704	0.0	3	male	25.0	36864	7.741700	Q	Mr	0	Solo
704	705	0.0	3	male	26.0	350025	7.854200	S	Mr	0	Small
705	706	0.0	2	male	39.0	250655	26.000000	S	Mr	0	Solo
713	714	0.0	3	male	29.0	7545	9.483300	S	Mr	0	Solo
714	715	0.0	2	male	52.0	250647	13.000000	S	Mr	0	Solo
715	716	0.0	3	male	19.0	348124	7.650000	S	Mr	1	Solo
719	720	0.0	3	male	33.0	347062	7.775000	S	Mr	0	Solo
721	722	0.0	3	male	17.0	350048	7.054200	S	Mr	0	Small
722	723	0.0	2	male	34.0	12233	13.000000	S	Mr	0	Solo
723	724	0.0	2	male	50.0	250643	13.000000	S	Mr	0	Solo
725	726	0.0	3	male	20.0	315094	8.662500	S	Mr	0	Solo
728	729	0.0	2	male	25.0	236853	26.000000	S	Mr	0	Small
729	730	0.0	3	female	25.0	STON/O2. 3101271	7.925000	S	Miss	0	Small
731	732	0.0	3	male	11.0	2699	18.787500	C	Mr	0	Solo
733	734	0.0	2	male	23.0	28425	13.000000	S	Mr	0	Solo
734	735	0.0	2	male	23.0	233639	13.000000	S	Mr	0	Solo
735	736	0.0	3	male	28.5	54636	16.100000	S	Mr	0	Solo
736	737	0.0	3	female	48.0	W./C. 6608	34.375000	S	Mrs	0	Big
741	742	0.0	1	male	36.0	19877	78.850000	S	Mr	1	Small
743	744	0.0	3	male	24.0	376566	16.100000	S	Mr	0	Small
745	746	0.0	1	male	70.0	WE/P 5735	71.000000	S	Mr	1	Small
746	747	0.0	3	male	16.0	C.A. 2673	20.250000	S	Mr	0	Small
748	749	0.0	1	male	19.0	113773	53.100000	S	Mr	1	Small
749	750	0.0	3	male	31.0	335097	7.750000	Q	Mr	0	Solo
752	753	0.0	3	male	33.0	345780	9.500000	S	Mr	0	Solo
753	754	0.0	3	male	23.0	349204	7.895800	S	Mr	0	Solo
756	757	0.0	3	male	28.0	350042	7.795800	S	Mr	0	Solo
757	758	0.0	2	male	18.0	29108	11.500000	S	Mr	0	Solo
758	759	0.0	3	male	34.0	363294	8.050000	S	Mr	0	Solo
761	762	0.0	3	male	41.0	SOTON/O2 3101272	7.125000	S	Mr	0	Solo
764	765	0.0	3	male	16.0	347074	7.775000	S	Mr	0	Solo
767	768	0.0	3	female	30.5	364850	7.750000	Q	Miss	0	Solo
769	770	0.0	3	male	32.0	8471	8.362500	S	Mr	0	Solo
770	771	0.0	3	male	24.0	345781	9.500000	S	Mr	0	Solo
771	772	0.0	3	male	48.0	350047	7.854200	S	Mr	0	Solo
772	773	0.0	2	female	57.0	S.O./P.P. 3	10.500000	S	Mrs	1	Solo
775	776	0.0	3	male	18.0	347078	7.750000	S	Mr	0	Solo
782	783	0.0	1	male	29.0	113501	30.000000	S	Mr	1	Solo
784	785	0.0	3	male	25.0	SOTON/O.Q. 3101312	7.050000	S	Mr	0	Solo
785	786	0.0	3	male	25.0	374887	7.250000	S	Mr	0	Solo
787	788	0.0	3	male	8.0	382652	29.125000	Q	Master	0	Big
789	790	0.0	1	male	46.0	PC 17593	79.200000	C	Mr	1	Solo
791	792	0.0	2	male	16.0	239865	26.000000	S	Mr	0	Solo
794	795	0.0	3	male	25.0	349203	7.895800	S	Mr	0	Solo
795	796	0.0	2	male	39.0	28213	13.000000	S	Mr	0	Solo
798	799	0.0	3	male	30.0	2685	7.229200	C	Mr	0	Solo
799	800	0.0	3	female	30.0	345773	24.150000	S	Mrs	0	Small
800	801	0.0	2	male	34.0	250647	13.000000	S	Mr	0	Solo
805	806	0.0	3	male	31.0	347063	7.775000	S	Mr	0	Solo
806	807	0.0	1	male	39.0	112050	66.747620	S	Mr	1	Solo
807	808	0.0	3	female	18.0	347087	7.775000	S	Miss	0	Solo
808	809	0.0	2	male	39.0	248723	13.000000	S	Mr	0	Solo
810	811	0.0	3	male	26.0	3474	7.887500	S	Mr	0	Solo
811	812	0.0	3	male	39.0	A/4 48871	24.150000	S	Mr	0	Solo
812	813	0.0	2	male	35.0	28206	10.500000	S	Mr	0	Solo
813	814	0.0	3	female	6.0	347082	31.275000	S	Miss	0	Big
814	815	0.0	3	male	30.5	364499	8.050000	S	Mr	0	Solo
816	817	0.0	3	female	23.0	STON/O2. 3101290	7.925000	S	Miss	0	Solo
817	818	0.0	2	male	31.0	S.C./PARIS 2079	37.004200	C	Mr	0	Small
818	819	0.0	3	male	43.0	C 7075	6.450000	S	Mr	0	Solo
819	820	0.0	3	male	10.0	347088	27.900000	S	Master	0	Big
822	823	0.0	1	male	38.0	19972	66.747620	S	Mr	0	Solo
824	825	0.0	3	male	2.0	3101295	39.687500	S	Master	0	Big
833	834	0.0	3	male	23.0	347468	7.854200	S	Mr	0	Solo
834	835	0.0	3	male	18.0	2223	8.300000	S	Mr	0	Solo
836	837	0.0	3	male	21.0	315097	8.662500	S	Mr	0	Solo
840	841	0.0	3	male	20.0	SOTON/O2 3101287	7.925000	S	Mr	0	Solo
841	842	0.0	2	male	16.0	S.O./P.P. 3	10.500000	S	Mr	0	Solo
843	844	0.0	3	male	34.5	2683	6.437500	C	Mr	0	Solo
844	845	0.0	3	male	17.0	315090	8.662500	S	Mr	0	Solo
845	846	0.0	3	male	42.0	C.A. 5547	7.550000	S	Mr	0	Solo
847	848	0.0	3	male	35.0	349213	7.895800	C	Mr	0	Solo
848	849	0.0	2	male	28.0	248727	33.000000	S	Rev	0	Small
850	851	0.0	3	male	4.0	347082	31.275000	S	Master	0	Big
851	852	0.0	3	male	74.0	347060	7.775000	S	Mr	0	Solo
852	853	0.0	3	female	9.0	2678	15.245800	C	Miss	0	Small
854	855	0.0	2	female	44.0	244252	26.000000	S	Mrs	0	Small
860	861	0.0	3	male	41.0	350026	14.108300	S	Mr	0	Small
861	862	0.0	2	male	21.0	28134	11.500000	S	Mr	0	Small
864	865	0.0	2	male	24.0	233866	13.000000	S	Mr	0	Solo
867	868	0.0	1	male	31.0	PC 17590	50.495800	S	Mr	1	Solo
870	871	0.0	3	male	26.0	349248	7.895800	S	Mr	0	Solo
872	873	0.0	1	male	33.0	695	5.000000	S	Mr	1	Solo
873	874	0.0	3	male	47.0	345765	9.000000	S	Mr	0	Solo
876	877	0.0	3	male	20.0	7534	9.845800	S	Mr	0	Solo
877	878	0.0	3	male	19.0	349212	7.895800	S	Mr	0	Solo
881	882	0.0	3	male	33.0	349257	7.895800	S	Mr	0	Solo
882	883	0.0	3	female	22.0	7552	10.516700	S	Miss	0	Solo
883	884	0.0	2	male	28.0	C.A./SOTON 34068	10.500000	S	Mr	0	Solo
884	885	0.0	3	male	25.0	SOTON/OQ 392076	7.050000	S	Mr	0	Solo
885	886	0.0	3	female	39.0	382652	29.125000	Q	Mrs	0	Big
886	887	0.0	2	male	27.0	211536	13.000000	S	Rev	0	Solo
890	891	0.0	3	male	32.0	370376	7.750000	Q	Mr	0	Solo
total_df['Age'] = total_df['Age'].fillna(total_df['Age'].mean())
print(total_df.isnull().sum())
PassengerId            0
Survived             418
Pclass                 0
Sex                    0
Age                    0
Ticket                 0
Fare                   0
Embarked               0
Title                  0
cabin_replace_num      0
Fam_type               0
dtype: int64
train_dataframe=total_df.query('Survived==1 or Survived==0')
train_dataframe.head()
PassengerId	Survived	Pclass	Sex	Age	Ticket	Fare	Embarked	Title	cabin_replace_num	Fam_type
0	1	0.0	3	male	22.0	A/5 21171	7.2500	S	Mr	0	Small
1	2	1.0	1	female	38.0	PC 17599	71.2833	C	Mrs	1	Small
2	3	1.0	3	female	26.0	STON/O2. 3101282	7.9250	S	Miss	0	Solo
3	4	1.0	1	female	35.0	113803	53.1000	S	Mrs	1	Small
4	5	0.0	3	male	35.0	373450	8.0500	S	Mr	0	Solo
train_dataframe.shape
(891, 11)
test_dataframe=total_df.query('Survived.isnull()')
test_dataframe.shape
(418, 11)
test_dataframe.drop('Survived', axis=1, inplace=True)
test_dataframe.shape
(418, 10)
print(train_dataframe.isnull().sum())
PassengerId          0
Survived             0
Pclass               0
Sex                  0
Age                  0
Ticket               0
Fare                 0
Embarked             0
Title                0
cabin_replace_num    0
Fam_type             0
dtype: int64
train_dataframe['Sex'] = train_dataframe['Sex'].map( {'female': 1, 'male': 0} ).astype(int)
test_dataframe['Sex'] = test_dataframe['Sex'].map( {'female': 1, 'male': 0} ).astype(int)
train_dataframe.drop(['PassengerId','Ticket'],axis=1, inplace=True )
test_dataframe.drop(['PassengerId','Ticket'],axis=1, inplace=True )
train_dataframe.head()
Survived	Pclass	Sex	Age	Fare	Embarked	Title	cabin_replace_num	Fam_type
0	0.0	3	0	22.0	7.2500	S	Mr	0	Small
1	1.0	1	1	38.0	71.2833	C	Mrs	1	Small
2	1.0	3	1	26.0	7.9250	S	Miss	0	Solo
3	1.0	1	1	35.0	53.1000	S	Mrs	1	Small
4	0.0	3	0	35.0	8.0500	S	Mr	0	Solo
test_dataframe.head()
Pclass	Sex	Age	Fare	Embarked	Title	cabin_replace_num	Fam_type
0	3	0	34.5	7.2500	Q	Mr	0	Solo
1	3	1	47.0	71.2833	S	Mrs	0	Small
2	2	0	62.0	7.9250	Q	Mr	0	Solo
3	3	0	27.0	53.1000	S	Mr	0	Solo
4	3	1	22.0	8.0500	S	Mrs	0	Small
train_dataframe.Title.value_counts()
Title
Mr        525
Miss      188
Mrs       125
Master     40
Dr          7
Rev         6
Name: count, dtype: int64
OneHotEncoding and scaling
# Define the columns for OneHotEncoding and scaling
categorical_features = ['Embarked', 'Title', 'Fam_type']
numeric_features = ['Pclass', 'Sex', 'Age', 'Fare', 'cabin_replace_num']

# Create the preprocessor using ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(drop='first'), categorical_features)
    ])

# Apply the transformations
X = preprocessor.fit_transform(train_dataframe.drop('Survived', axis=1))
y = train_dataframe['Survived']
test_dataframe = preprocessor.transform(test_dataframe)
Models and their Parameters
# Define models and their parameters
models = {
    'LogisticRegression': LogisticRegression(max_iter=1000),
    'RandomForest': RandomForestClassifier(),
    'GradientBoosting': GradientBoostingClassifier(),
    'XGBoost': XGBClassifier(use_label_encoder=False, eval_metric='logloss'),
    'SVC': SVC(),
    'KNeighbors': KNeighborsClassifier()
}

params = {
    'LogisticRegression': {'C': [0.1, 1, 10, 100]},
    'RandomForest': {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]},
    'GradientBoosting': {'n_estimators': [50, 100, 200], 'learning_rate': [0.01, 0.1, 0.2]},
    'XGBoost': {'n_estimators': [50, 100, 200], 'learning_rate': [0.01, 0.1, 0.2]},
    'SVC': {'C': [0.1, 1, 10, 100], 'kernel': ['linear', 'rbf']},
    'KNeighbors': {'n_neighbors': [3, 5, 7, 9]}
}

# Initialize StratifiedKFold
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Train models and find best parameters
best_models = {}
best_scores = {}
for name, model in models.items():
    print(f"Training {name}...")
    grid = GridSearchCV(model, params[name], cv=kf, scoring='accuracy')
    grid.fit(X, y)
    best_models[name] = grid.best_estimator_
    best_scores[name] = grid.best_score_
    print(f"Best parameters for {name}: {grid.best_params_}")
    print(f"Best score for {name}: {grid.best_score_}")
Training LogisticRegression...
Best parameters for LogisticRegression: {'C': 1}
Best score for LogisticRegression: 0.8327537505492437
Training RandomForest...
Best parameters for RandomForest: {'max_depth': 10, 'n_estimators': 50}
Best score for RandomForest: 0.8350134957002071
Training GradientBoosting...
Best parameters for GradientBoosting: {'learning_rate': 0.1, 'n_estimators': 200}
Best score for GradientBoosting: 0.8473353838428223
Training XGBoost...
Best parameters for XGBoost: {'learning_rate': 0.2, 'n_estimators': 50}
Best score for XGBoost: 0.8349883874207519
Training SVC...
Best parameters for SVC: {'C': 1, 'kernel': 'rbf'}
Best score for SVC: 0.8338899001945892
Training KNeighbors...
Best parameters for KNeighbors: {'n_neighbors': 7}
Best score for KNeighbors: 0.8383780051471973
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Evaluate each model
validation_scores = {}
for name, model in best_models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    accuracy = accuracy_score(y_val, y_pred)
    validation_scores[name] = accuracy
    print(f"Evaluation of {name}:")
    print(f"Accuracy: {accuracy}")
    print(f"Confusion Matrix:\n{confusion_matrix(y_val, y_pred)}")
    print(f"Classification Report:\n{classification_report(y_val, y_pred)}\n")
Evaluation of LogisticRegression:
Accuracy: 0.8379888268156425
Confusion Matrix:
[[97 13]
 [16 53]]
Classification Report:
              precision    recall  f1-score   support

         0.0       0.86      0.88      0.87       110
         1.0       0.80      0.77      0.79        69

    accuracy                           0.84       179
   macro avg       0.83      0.82      0.83       179
weighted avg       0.84      0.84      0.84       179


Evaluation of RandomForest:
Accuracy: 0.8044692737430168
Confusion Matrix:
[[94 16]
 [19 50]]
Classification Report:
              precision    recall  f1-score   support

         0.0       0.83      0.85      0.84       110
         1.0       0.76      0.72      0.74        69

    accuracy                           0.80       179
   macro avg       0.79      0.79      0.79       179
weighted avg       0.80      0.80      0.80       179


Evaluation of GradientBoosting:
Accuracy: 0.8100558659217877
Confusion Matrix:
[[98 12]
 [22 47]]
Classification Report:
              precision    recall  f1-score   support

         0.0       0.82      0.89      0.85       110
         1.0       0.80      0.68      0.73        69

    accuracy                           0.81       179
   macro avg       0.81      0.79      0.79       179
weighted avg       0.81      0.81      0.81       179


Evaluation of XGBoost:
Accuracy: 0.8268156424581006
Confusion Matrix:
[[98 12]
 [19 50]]
Classification Report:
              precision    recall  f1-score   support

         0.0       0.84      0.89      0.86       110
         1.0       0.81      0.72      0.76        69

    accuracy                           0.83       179
   macro avg       0.82      0.81      0.81       179
weighted avg       0.83      0.83      0.82       179


Evaluation of SVC:
Accuracy: 0.8379888268156425
Confusion Matrix:
[[104   6]
 [ 23  46]]
Classification Report:
              precision    recall  f1-score   support

         0.0       0.82      0.95      0.88       110
         1.0       0.88      0.67      0.76        69

    accuracy                           0.84       179
   macro avg       0.85      0.81      0.82       179
weighted avg       0.84      0.84      0.83       179


Evaluation of KNeighbors:
Accuracy: 0.8100558659217877
Confusion Matrix:
[[95 15]
 [19 50]]
Classification Report:
              precision    recall  f1-score   support

         0.0       0.83      0.86      0.85       110
         1.0       0.77      0.72      0.75        69

    accuracy                           0.81       179
   macro avg       0.80      0.79      0.80       179
weighted avg       0.81      0.81      0.81       179


best_model_name = max(validation_scores, key=validation_scores.get)
best_model = best_models[best_model_name]
print(f"The best model is: {best_model_name} with accuracy: {validation_scores[best_model_name]}")
The best model is: LogisticRegression with accuracy: 0.8379888268156425
# Train the best model on the full training data
best_model.fit(X, y)
LogisticRegression(C=1, max_iter=1000)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
# Make predictions
predictions = best_model.predict(test_dataframe)
predictions
array([0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 1., 1., 0.,
       0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1.,
       1., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0.,
       0., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 0.,
       1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0.,
       0., 1., 1., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0.,
       0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 1., 1.,
       1., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
       1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 1.,
       0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1.,
       0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 1., 1., 0., 1.,
       0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 1.,
       0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
       1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 1., 0., 1., 0., 0.,
       0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0., 0.,
       1., 1., 0., 1., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.,
       0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1.,
       0., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1.,
       0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 1.,
       0., 0., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1., 0., 0., 1., 0., 0.,
       1., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 0., 1.,
       1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
       1., 1., 1., 1., 1., 0., 1., 0., 0., 1.])
submission = pd.DataFrame({
        "PassengerId": test_df["PassengerId"],
        "Survived": predictions.astype('int')
    })

submission.to_csv('submission.csv', index=False)
         


要查看或添加评论,请登录

Shah Muhammad Fazle Rabbi的更多文章

社区洞察

其他会员也浏览了