登录查看更多内容

Can EBITDA normalization adjustments be automated? Part 2 (with model)

Arsalan Pardesi

Partner/Principal - EY-Parthenon | M&A + Analytics + AI/ML | Tech

发布日期: 2021年3月10日

In my first article on this topic, I explored whether EBITDA normalization adjustments can be automated. In this article, we will try to automate a one-off cost scenario using simple statistical model and machine learning. If you have not read the first article, you can find it here.

The dataset used in this article is a monthly five year expense account. The dataset has been deliberately kept simple as the purpose of the article is to explore a proof-of-concept model, which can later be improved.

Three models are explored in the article:

1) Simple statistical model – the dataset has been kept simple here with 6 anomalies in monthly expenses. The goal is to create a model that automatically finds these anomalies in the dataset.

2) Convolutional autoencoder model 1 – the dataset here has been kept flat for all the years with anomalies introduced in 1 year only. The goal is to identify the anomalous year.

3) Convolutional autoencoder model 2 – this is same model as step 2, but the dataset now has seasonality. The goal is to identify the anomalous year but not to treat the seasonality in the business as anomalous.

My autoencoder model is based on the anomaly detection model by Pavithra Vijay on Kera’s tutorials/examples.

Simple model – monthly anomalies

Given that the problem was laid out in the first article, I will not delve into the problem in detail again. Let us dive straight into data using Python.

We would just import pandas, numpy and matplotlib for the purpose of this model.

Let us explore our data. We have 84 rows of data and 2 columns – one for date and one for value.

Based on the plot of the data, we can clearly see there are 6 anomalies in the dataset. The goal is to create a model that will automatically identify these anomalies. Let us plot the data using good old matplotlib.

Now we get to the fun part of creating the model. As you can see from the model, it is nothing exciting. For a simple anomaly detection, all we will do is calculate the mean, standard deviation and z-score.

Now all we have to do, is define a threshold for anomaly detection. For the purpose of this analysis, I defined the threshold as 2. For the next 2 models, we will make the threshold selection automated.

Based on our threshold definition, the model predicted that there are 6 anomalies in the dataset and we can see the location of the anomalies in the data. Let us now plot the anomalies on top of the dataset to check, if our model has picked up the correct datapoints as anomalies.

We can also extract the subset of anomalous data from the dataset.

This was a very simple solution but looking at data points in isolation may not be very useful in a real-world scenario. Let us know make the data a little more noisy for 1 year while keeping the other years flat. In this scenario, we need the model to recognize a time-series and find anomalies in any 12 month period.

Convolutional autoencoder model 1

Before, I dive into the model, I will take a few minutes (paragraphs!) to briefly explain what an artificial neural network is and what an autoencoder model is. If you are familiar with the workings on an autoencoder model, please feel free to skip to the actual model.

Machine learning is a means by which a computer learns to perform tasks based on the analysis of the dataset provided as training data. Neural networks or deep learning is a branch of machine learning that utilizes a network of interconnected nodes to learn about the training data and then to subsequently predict. (Neural Network picture credit: Offnfopt (Wikipedia).

An autoencoder is a type of neural network that learns a representation for a set of data in an unsupervised manner (i.e. we do not need to provide a labeled dataset). An autoencoder is typically used in computer vision problems for dimensionality reduction and reducing noise in the data. Simply stated, it learns what are the key representations/signals in the data that it requires to learn in order to reconstruct that same dataset. . The encoder is part of the network that compresses the input into a latent space representation. The code is compressed input that is then fed into the decoder. The decoder tries to reconstruct the input based on the latent space representation. (Auto Encoder picture credit: Chervinskii (Wikipedia)).

We will use a convolutional autoencoder model to learn time-series data and then predict the time-series data for the same data points to identify anomalies. We will use Keras, which is an open-source Python library for developing and evaluating deep learning models. It acts as an interface to the TensorFlow library.

Let us know get into the model. We will now also need to import Keras in addition to the libraries that we previously imported.

As discussed, the anomaly in data for Model 2 is introduced in 1 year only with the aim of creating a model that identifies that as an anomaly.

We will again calculate the mean, standard deviation and the z-score but instead of defining a threshold based on z-score only, we will create a time-series model.

Now that we have created a time-series input, let us now define our autoencoder model.

Now that we have defined our model, let us now train the model.

Let us now plot the training and validation loss history.

Now that we have trained the model. Let us predict the values and define threshold. For the purpose of this exercise, I have defined the threshold as (max(train_error_loss) – mean(train_error_loss))/2.

Using the code we used in Model 1 to plot the anomalies on top of the data, we can see that the model was correctly able to identify the anomalous trend.

Convolutional autoencoder model 3

Our model was able to identify the anomalous trend in Model 2 but data in real world may have some seasonality that we would not want our model to flag as anomalous. We will use Model 2 to now train on data that has seasonality but also has anomalous data points in 1 year, as depicted in the chart.

Using exactly the same model, we can see that the model was able to correctly learn the seasonality as not an anomaly and only flag the data pattern not following the seasonality as anomalous.

Conclusion

This is simple example using a very simple dataset but what it shows is that a lot can be done to automate the financial due diligence process so that practitioners can focus more on value-added insights and less on simple data analysis. I would again reiterate that if you would like to collaborate on this, please feel free to reach out to me.

--------------------------------------------------------------------------------------

Important Note: All comments and opinions are my own and do not represent those of my employer

Gafar Shittu, MS, CPA, CVA

I help litigation attorneys with business valuation disputes make informed decisions, so they achieve better results for their clients.

3 年

EBITDA adjustment normalizations cannot be automated. The EBITDA figure is a function of creativity to project the Target in a particular light. Let’s be frank. It’s subjective, not objective. And it’s only creativity that can unravel it to an Equitable level.

Nathan Chao

Decision Modeling and Analytics - East Region Leader | Global Client Service Partner | Media and Entertainment, Tech, Financial Services | TMT Leader - New York Metro Market | Board Member

4 年

Great article! Very informative.

Xavier Vitiello

Corporate Finance | Data Analytics & AI | Value Creation | Economics

4 年

Henry TrungHieu Tu

查看更多评论

要查看或添加评论，请登录

Arsalan Pardesi的更多文章

Why You Should Care About Log4j Vulnerability

2021年12月27日

Why You Should Care About Log4j Vulnerability

If you read the news over the last few weeks, chances are that you may have come across the news of Log4j vulnerability…
Should data scientists/analysts be concerned about information security? [with examples of SQL injection and buffer overflow :)]

2021年6月6日

Should data scientists/analysts be concerned about information security? [with examples of SQL injection and buffer overflow :)]

Should data scientists/data analysts really have to be concerned about information security as well? Given how fast…
M&A Transaction close on blockchain

2021年5月27日

M&A Transaction close on blockchain

Over the last year, if I had bought a crypto currency (whatever the new one was for that week) every time some crypto…

1 条评论
Can EBITDA normalization adjustments be automated?

2021年3月7日

Can EBITDA normalization adjustments be automated?

If you have ever worked on an M&A transaction, you would know that a lot of time is spent in understanding the…

6 条评论
Exploring Herding Behavior in financial markets using Agent-Based Modeling

2021年3月1日

Exploring Herding Behavior in financial markets using Agent-Based Modeling

If you have followed the recent GameStop (GME) trading frenzy, you have the power of “crowd” or “herd” behavior in…

2 条评论
Analyzing ARR and Retention Analysis using Python

2020年12月13日

Analyzing ARR and Retention Analysis using Python

If you have ever analyzed a software company (or any company that has a subscription business), a common metric that…

1 条评论
Valuing ‘dark data’ during divestments

2017年4月29日

Valuing ‘dark data’ during divestments

In my first article on the topic (Are you leaving dark data on the table?), I explored the concept of ‘dark data’ and…
Are you leaving ‘dark data’ on the table?

2017年4月21日

Are you leaving ‘dark data’ on the table?

Data has always existed. Advances in technology have enabled both customers and companies to collect, collate and…

See all articles

Can EBITDA normalization adjustments be automated? Part 2 (with model)

Arsalan Pardesi

Partner/Principal - EY-Parthenon | M&A + Analytics + AI/ML | Tech

Arsalan Pardesi的更多文章

社区洞察

其他会员也浏览了

Building 10 Classifier ????Models in Machine?Learning + Notebook

Decision Trees: A Guide to Understanding and Building

A Complete Guide to Principal Component Analysis — PCA in Machine Learning

Day 13 — Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

ML Classification Algorithms to Predict Market Movements and Backtesting

Logistic regression made simple

Linear Regression with TensorFlow.js

Time Series Analysis: A Guide for working with Time Series

Image classification using SVM ( 92% accuracy)-Quick reference for beginners

The Sliding Window Pattern: A Powerful Technique for Efficient Problem-Solving

Arsalan Pardesi的更多文章

Why You Should Care About Log4j Vulnerability

Should data scientists/analysts be concerned about information security? [with examples of SQL injection and buffer overflow :)]

M&A Transaction close on blockchain