Confidence Interval in TCA Cost Estimates Explained
When estimating the market impact of an institutional order, most models will predict a distribution of outcomes. Predictions take the form of expected cost plus or minus a confidence interval. The expected cost is easy to understand. It is the cost that is most likely to occur. The confidence interval is a little more complex. It represents the expected range around the expected value. This is a very important number because it provides information about how much variation from the expected cost can be expected. Together these 2 numbers can be used to describe a distribution. We call the distribution parametric because we describe it’s shape by two numbers. In many cases, a Gaussian (also called a normal) distribution is used.
A distribution is used because it allows for the ‘shape’ of the data to be considered. The shape comes from a histogram plot where the number of observations of a data point within a small range is plotted (i.e. the data is binned before plotting). Shapes provide more information than simple numbers. In many cases we use many more than 2 numbers, but the main point is we use more than just a single number when describing a distribution.
When it comes to making market impact cost predictions, we always will have an expected cost that is positive. To see this, consider a buy order. When the order – or any small slice of the it – is sent to an order book, it can only have upward pressure. It may have very small upward pressure if the order is small, but a buy order never pushes the market down. Many other factors can push the market down (or up) that have nothing to do with the order. As a result, sometimes even large buy orders can be filled with average prices below the arrival price benchmark (i.e. a ‘negative’ cost).
The confidence interval provides information about the level of volatility that our order is expected to be exposed to. The volatility comes from the interaction of other market participants (and perhaps our own earlier participation). Because of the volatility, our predictions are distributions of outcomes. When assessing the accuracy of a cost model, both the expected cost and the confidence interval are important. Different stocks have different volatilities, and the same stock can have different volatilities at different times.
Even though we know that a buy order cannot push a stock down, we still account for the likelihood that the activities of other traders may cause the stock price to go down. Our model output can be used to provide this probability. It can also be used to predict the probability of costs being more than the expected cost, allowing for things like optimizing trading strategies by limiting exposure to the uncertainty caused by volatility.
To illustrate using a simple example, we can imagine a cost estimate to be $.2 +/- .35. We expect that the cost will be .2 but understand that with some probability (eg: 67%) the range could be from $-.15 to +.55. We can plot a Gaussian (normal) distribution using these 2 numbers. We can see, that with this prediction, we would expect to outperform (have negative costs) about 28% of the time.
To summarize, when we get an estimate from a market impact model, we usually get an expected value and a range around the expected value, called the confidence interval. The confidence interval arises from the volatility of the stock which introduces uncertainty into the prediction of the expected value. This means that we predict both positive and negative costs, even though we 'expect' all costs to be positive. The confidence interval helps us do things like optimize trading strategies and is important to be properly calibrated.