AI in IA (Artificial Intelligence in Internal Audit) - Part 2
By Mahendra Khiani (Image sourced from google)

AI in IA (Artificial Intelligence in Internal Audit) - Part 2


Key Application Objectives

Industry: Oil & Gas

Objective: Identify Factors/Variables that may cause temporary shutdown in case of a tragic accident in Oil pipeline.

ML applied to:

a.????Identify predictor variables: Key factors to estimate if it can result in a temporary shutdown of pipeline in case of an incident so as to recommend maximum controls around those areas from Internal Audit perspective

b.????Predict the losses: Based on (non) performance of each of the factors predict the severity of each factor and impact on losses leading to timely management reporting?

c.????Prepare a quantitative model: Representation of possibility of shutdown as well as the losses on absolute and percentage terms thereby working as an underlying basis for preparing a robust internal audit risk assessment plan.

Classification Model for predicting Shutdown (Classification Models – Logistic regression and Random Forest):


1.???????Train -Test split: We need to split our dataset in to two parts. One for the training our data models and another for testing the accuracy of the model. We Perform an 80%–20% split of the data into training and test sets.

We utilize the train data set to build the logistic regression model/algorithm which would be used for prediction.?

This included operator, location of accident, geographical site i.e. underground, transition area, above the ground, did it result in ignition or explosion, reason i.e. external or natural force, equipment failures, corrosion etc., environmental damage, property damage etc. ?

Once the model for prediction is built it is tested on the remaining 20% of the data or the test data to assess its effectiveness in prediction.

  • ?2.??????Predictor variables and results: Using the logistic regression model, the independent or predictor variables to predict the temporary shutdown in the model are given as below:
  • X1= Cause Category (Root cause of incident that led to the accident)
  • X2 = Net Loss Barrels (Total quantity of liquid lost due to accident)
  • X3= Pipeline Type (Geographical location of pipeline i.e. above the ground, under the ground etc.)
  • X4 = Liquid ignition (Did the incident result in liquid ignition)

Testing the Model Accuracy:

Model Predictions: Now we run the confusion matrix on test data response variable and predicted value to check the accuracy of our model which gives an accuracy of 60%. We further modified this model using random forests ensembles which can improve the accuracy of predictions. Below are the results after applying random forest model

Confusion Matrix:

No alt text provided for this image




As can be seen from Table, Accuracy of the model is 74.43% with accurate prediction of shutdown at 80% which is significantly better than random as well as Logistic regression model.

About the data:

We have analyzed the database available on Kaggle relating to oil pipeline leak or spill reported to the Pipeline and Hazardous Materials Safety Administration. Period of the data being analyzed is 2010 to 2017. Data included incident date and time, operator and pipeline, cause of incident, type of hazardous liquid and quantity lost, injuries and fatalities, and associated costs.

?References:

Source of data: https://www.kaggle.com/usdot/pipeline-accidents

Love this

Sundeep Bansilal

Service Delivery | Business Transformation | Digitalization & Automation

3 年

Cleanly summarized.

Malathy Amin

IIMK/Data Science/Fintech/Finance/Tax Professional

3 年

Thanks for posting

Very detailed article, really helpful. Can't wait for more!

要查看或添加评论,请登录

Mahendra Khiani的更多文章

社区洞察

其他会员也浏览了