AI in IA (Artificial Intelligence in Internal Audit) - Part 2
Mahendra Khiani
Passionate about building technology enabled Nextgen GRCS solutions| Helping Clients in building risk quantification/management models | ICAI | IIMK | Director -KPMG
Key Application Objectives
Industry: Oil & Gas
Objective: Identify Factors/Variables that may cause temporary shutdown in case of a tragic accident in Oil pipeline.
ML applied to:
a.????Identify predictor variables: Key factors to estimate if it can result in a temporary shutdown of pipeline in case of an incident so as to recommend maximum controls around those areas from Internal Audit perspective
b.????Predict the losses: Based on (non) performance of each of the factors predict the severity of each factor and impact on losses leading to timely management reporting?
c.????Prepare a quantitative model: Representation of possibility of shutdown as well as the losses on absolute and percentage terms thereby working as an underlying basis for preparing a robust internal audit risk assessment plan.
Classification Model for predicting Shutdown (Classification Models – Logistic regression and Random Forest):
1.???????Train -Test split: We need to split our dataset in to two parts. One for the training our data models and another for testing the accuracy of the model. We Perform an 80%–20% split of the data into training and test sets.
We utilize the train data set to build the logistic regression model/algorithm which would be used for prediction.?
This included operator, location of accident, geographical site i.e. underground, transition area, above the ground, did it result in ignition or explosion, reason i.e. external or natural force, equipment failures, corrosion etc., environmental damage, property damage etc. ?
Once the model for prediction is built it is tested on the remaining 20% of the data or the test data to assess its effectiveness in prediction.
领英推荐
Testing the Model Accuracy:
Model Predictions: Now we run the confusion matrix on test data response variable and predicted value to check the accuracy of our model which gives an accuracy of 60%. We further modified this model using random forests ensembles which can improve the accuracy of predictions. Below are the results after applying random forest model
Confusion Matrix:
As can be seen from Table, Accuracy of the model is 74.43% with accurate prediction of shutdown at 80% which is significantly better than random as well as Logistic regression model.
About the data:
We have analyzed the database available on Kaggle relating to oil pipeline leak or spill reported to the Pipeline and Hazardous Materials Safety Administration. Period of the data being analyzed is 2010 to 2017. Data included incident date and time, operator and pipeline, cause of incident, type of hazardous liquid and quantity lost, injuries and fatalities, and associated costs.
?References:
Source of data: https://www.kaggle.com/usdot/pipeline-accidents
Ernst & Young
3 年Love this
Service Delivery | Business Transformation | Digitalization & Automation
3 年Cleanly summarized.
IIMK/Data Science/Fintech/Finance/Tax Professional
3 年Thanks for posting
Very detailed article, really helpful. Can't wait for more!