Jili libreng 100 register app,Ezwine cabernet sauvignon.REGISTER NOW GET FREE 888 PESOS REWARDS!

Random Forest?Model:

The random forest model is also a classification model with the combination of the decision tree. The random forest algorithm is a supervised classification algorithm. As the name suggests, this algorithm creates a forest with several trees. … In the same way in the random tree classifier, the higher the number of trees in the forest gives the high the accuracy results. If you know the Random forest algorithm is a supervised classification algorithm.

The random forest model follows an ensemble technique. It involves constructing multi decision trees at training time. Its prediction based on mode for classification and mean for regression tree. It helps to reduce the overfitting of the individual decision tree. There are many possibilities for the occurrence of overfitting.

Working of Random Forest Algorithm

We can understand the working of the Random Forest algorithm with the help of following steps ?

Step 1?? First, start with the selection of random samples from a given dataset. Do sampling without replacement.

Sampling without replacement stats that the training data split into several small samples and then the result we get is a combination of all the data set. If we have 1000 features in a data set the splitting will happen with 10 features each in a small training data and all split training data contains equal no of features. The result is based on which training data has the highest value.

Step 2?? Next, this algorithm will construct a decision tree for every sample. Then it will get the prediction result from every decision tree.
Step 3?? In this step, voting will be performed for every predicted result.
Based on ‘n’ samples… ‘n’ tree is built
Each record is classified based on the n tree
The final class for each record is decided based on voting

Step 4?? At last, select the most voted prediction result as the final prediction result.

What is the Out of Bag score in Random Forests?

Out of bag (OOB) score is a way of validating the Random forest model. Below is a simple intuition of how is it calculated followed by a description of how it is different from the validation score and where it is advantageous.

For the description of the OOB score calculation, let’s assume there are five DTs in the random forest ensemble labeled from 1 to 5. For simplicity, suppose we have a simple original training data set as below.

OOB Error Rate Computation Steps

Sample left out (out-of-bag) in Kth tree is classified using the Kth tree
Assume j cases are misclassified
The proportion of time that j is not equal to true class averaged over all cases is the OOB error rate.

Variable importance of RF:?

It stats about the feature that is most useful for the random forest model by which we can get the high accuracy of the model with less error.

Random Forest computes two measures of Variable Importance
Mean Decrease in Accuracy
Mean Decrease in Gini
Mean Decrease in Accuracy is based on permutation
Randomly permute values of a variable for which importance is to be computed in the OOB sample
Compute the Error Rate with permuted values
Compute decrease in OOB Error rate (Permuted- Not permuted)
Average the decrease overall the trees
Mean Decrease in Gini is computed as a “total decrease in node?impurities from splitting on the variable averaged over all trees”.

Finding the optimal values using grid-search cv:

It stats the optimal values of the splitting decision tree that how many trees to be split?within the model.

Measuring RF model performance by Confusion Matrix:

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm. It tells about how many true values are true.

Random Forest with python:?

Importing the important libraries–

import pandas as pd

import numpy as np

from sklearn import preprocessing

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import confusion_matrix

from sklearn.metrics import accuracy_score

from sklearn import svm

from sklearn.metrics import roc_curve, auc

import matplotlib.pyplot as plt

from sklearn.externals.six import StringIO

from IPython.display import Image

from sklearn.tree import export_graphviz

Read the data from csv

dummy_df = pd.read_csv("bank.csv", na_values =['NA'])

temp = dummy_df.columns.values[0]

temp

print(dummy_df)

Data Pre-Processing:

columns_name = temp.split(';')

data = dummy_df.values

print(data)

print(data.shape)

contacts = list()

for element in data:

contact = element[0].split(';')

contacts.append(contact)

contact_df = pd.DataFrame(contacts,columns = columns_name)

print(contact_df)

def preprocessor(df):

res_df = df.copy()

le = preprocessing.LabelEncoder()

?encoded_df = preprocessor(contact_df)

#encoded_df = preprocessor(contacts)

x = encoded_df.drop(['"y"'],axis =1).values

y = encoded_df['"y"'].values

Split the data into Train-Test?

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size =0.5)

Build the Decision Tree Model

# Decision tree with depth = 2

model_dt_2 = DecisionTreeClassifier(random_state=1, max_depth=2)

model_dt_2.fit(x_train, y_train)

model_dt_2_score_train = model_dt_2.score(x_train, y_train)

print("Training score: ",model_dt_2_score_train)

model_dt_2_score_test = model_dt_2.score(x_test, y_test)

print("Testing score: ",model_dt_2_score_test)

#y_pred_dt = model_dt_2.predict_proba(x_test)[:, 1]

#Decision tree

model_dt = DecisionTreeClassifier(max_depth = 8, criterion ="entropy")

model_dt.fit(x_train, y_train)

y_pred_dt = model_dt.predict_proba(x_test)[:, 1]

Graphical Representation of Tree

plt.figure(figsize=(6,6))

dot_data = StringIO()

export_graphviz(model_dt, out_file=dot_data,

filled=True, rounded=True,

special_characters=True)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue())

Image(graph.create_png())

Performance Metrics

fpr_dt, tpr_dt, _ = roc_curve(y_test, y_pred_dt)

roc_auc_dt = auc(fpr_dt, tpr_dt)

predictions = model_dt.predict(x_test)

# Model Accuracy

print (model_dt.score(x_test, y_test))

y_actual_result = y_test[0]

for i in range(len(predictions)):

if(predictions[i] == 1):

y_actual_result = np.vstack((y_actual_result, y_test[i]))

Recall

#Recall

y_actual_result = y_actual_result.flatten()

count = 0

for result in y_actual_result:

if(result == 1):

count=count+1

print ("true yes|predicted yes:")

print (count/float(len(y_actual_result)))

Area Under the Curve

plt.figure(1)

lw = 2

plt.plot(fpr_dt, tpr_dt, color='green',

lw=lw, label='Decision Tree(AUC = %0.2f)' % roc_auc_dt)

plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Area Under Curve')

plt.legend(loc="lower right")

plt.show()

Confusion Matrix

print (confusion_matrix(y_test, predictions))

accuracy_score(y_test, predictions)

import itertools

from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(model, normalize=False): # This function prints and plots the confusion matrix.

cm = confusion_matrix(y_test, model, labels=[0, 1])

classes=["Success", "Default"]

cmap = plt.cm.Blues

title = "Confusion Matrix"

if normalize:

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

cm = np.around(cm, decimals=3)

plt.imshow(cm, interpolation='nearest', cmap=cmap)

plt.title(title)

plt.colorbar()

tick_marks = np.arange(len(classes))

plt.xticks(tick_marks, classes, rotation=45)

plt.yticks(tick_marks, classes)

thresh = cm.max() / 2.

for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):

plt.text(j, i, cm[i, j],

horizontalalignment="center",

color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()

plt.ylabel('True label')

plt.xlabel('Predicted label')

plt.figure(figsize=(6,6))

plot_confusion_matrix(predictions, normalize=False)

plt.show()

?

Pruning of the tree?

from sklearn.tree._tree import TREE_LEAF

def prune_index(inner_tree, index, threshold):

if inner_tree.value[index].min() < threshold:

# turn node into a leaf by "unlinking" its children

inner_tree.children_left[index] = TREE_LEAF

inner_tree.children_right[index] = TREE_LEAF

# if there are shildren, visit them as well

if inner_tree.children_left[index] != TREE_LEAF:

prune_index(inner_tree, inner_tree.children_left[index], threshold)

prune_index(inner_tree, inner_tree.children_right[index], threshold)

print(sum(model_dt.tree_.children_left < 0))

# start pruning from the root

prune_index(model_dt.tree_, 0, 5)

sum(model_dt.tree_.children_left < 0)

#It means that the code has created 17 new leaf nodes

#(by practically removing links to their ancestors). The tree, which has looked before like

from sklearn.externals.six import StringIO

from IPython.display import Image

from sklearn.tree import export_graphviz

import pydotplus

plt.figure(figsize=(6,6))

dot_data = StringIO()

export_graphviz(model_dt, out_file=dot_data,

filled=True, rounded=True,

special_characters=True)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue())

Image(graph.create_png())

Learnbay ?provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. By choosing Learnbay you will reach the most aspiring job of present and future.

Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

?

RANDOM FOREST MODEL(RFM)

Shanti A

Data Scientist & Tech Head at Learnbay | Transforming Businesses through Advanced Analytics.

Working of Random Forest Algorithm

Data Pre-Processing:

Split the data into Train-Test?

Build the Decision Tree Model

Graphical Representation of Tree

Performance Metrics

领英推荐

Recall

Area Under the Curve

Confusion Matrix

Pruning of the tree?

更多精彩文章

社区洞察

其他会员也浏览了

The Accidental Data Scientists

K-NN algorithm few facts

5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

You are into Data Science? Learn Linear Regression first (Introduction, some Pitfalls and how to avoid them)

Why should treat outliers with Nearest Neighbor and Local Outlier Factor?

Understanding Gaussian Mixture Models (GMMs) - The Probabilistic Modelling

Casual Inference is DS with context

Day 4: Random Forest

Back-tested Models: Unveiling the Past to Predict the Future

Error Analysis & the Baseline Model: A Love Story ??

Working of Random Forest Algorithm

Data Pre-Processing:

Split the data into Train-Test?

Build the Decision Tree Model

Graphical Representation of Tree

Performance Metrics

领英推荐

Recall

Area Under the Curve

Confusion Matrix

Pruning of the tree?

AI Innovation: GPT-3 - A Game Changer for Natural Language Processing

2023年3月20日

Data Science in Healthcare– Know The Hidden Scopes.

2022年5月20日

AI Automation for HR management

2021年9月6日

Trends in Data Science

2021年9月3日

Know The Best Strategy To Find The Right Data Science Job in Delhi?

2021年9月1日

Predictive and prescriptive analysis in Data science for analyst

2021年8月31日

Applications of Data Science in Banking and Finance

2021年8月27日

Applications of Zeroth Order Optimization in Deep Learning

2021年8月26日

Data Science for a Managerial Role

2021年8月25日

5 CAREER SMASHING BLUNDERS: EVERY NEW DATA SCIENTIST SHOULD AVOID

2021年8月20日

社区洞察

其他会员也浏览了

The Accidental Data Scientists

K-NN algorithm few facts

5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

You are into Data Science? Learn Linear Regression first (Introduction, some Pitfalls and how to avoid them)

Why should treat outliers with Nearest Neighbor and Local Outlier Factor?

Understanding Gaussian Mixture Models (GMMs) - The Probabilistic Modelling

Casual Inference is DS with context

Day 4: Random Forest

Back-tested Models: Unveiling the Past to Predict the Future

Error Analysis & the Baseline Model: A Love Story ??