Converting bits into dollars: Why and how data is able to generate business value
A post on why and how data is able to generate business value.
Introduction
Data is important because it encodes information. Information is the quantity that enables understanding (knowledge) and prediction. Prediction differs from knowledge as it implies a future and probabilistic temporal component. Information can be measured by ‘Information Entropy’, with units of ‘bits’; it's a quantifiable entity. Different data, can encode different information. Let's look at a few examples to illustrate these topics.
'My name is Bob’ in French and ‘my name is Bob’ in Japanese is different data/encoding, but the same information. In a second example, we note that the entropy of French is low (ie it is fairly predictable), while the entropy of Japanese is higher. Likewise the entropy of signed financial market data returns is much higher than the corresponding measure for volume. In a third example, a time series may show cyclic information when decoded at one resolution, and trend information when decoded at another resolution. In a final example, if a system has two users each incentivized to predict future system state, then the user with the informational advantage will win - the advantage could come from receiving the same information first, or receiving more information.
It may surprise readers to know there is a well-established duality between information theory and system growth [1]. This can be applied to the growth of a business: For a business which uses information to generate revenues, the sum of the growth rate and the entropy rate is constant. Put differently, information can be used to grow revenues. [See Appendix A].
The mechanism for this is simple: A data generating process (DGP) creates data. The data contains stationary patterns that can be recognized, from which a prediction about the future can be made. By being able to predict the future, we can generate revenues/profit. Let's take two basic use cases,
For both these use cases, we have the concept of intrinsic and extrinsic data. Intrinsic data is the data generated by the system that is to be predicted. Extrinsic data is data generated outside the system that is to be predicted. For example, for a food shop, intrinsic data might be customers might the number of customer queries asking for ice cream, while extrinsic data might be the weather temperature. By increasing the amount of information used to make the prediction, both the intrinsic and extrinsic data can be used to help improve prediction accuracy.
Financial Services Industry Example
Take the example of the trading fund WorldQuant , who detail how they use data-driven prediction to generate revenues [3-6].
Data: “A data group scours the globe for interesting and new data sets, including everything from detailed market pricing data to shipping statistics to footfall in stores captured by apps on smartphones”. “If we could buy,?consume,?create,?web scrape,?more data than anyone else,?we could create more alpha,?find more?opportunities”. “In 2007 we had two data sets - today [2022] we have more than 1,400.”
Data Processing: A prediction algorithm takes data as the input, processes it, and outputs an ‘alpha’ (aka predictor, signal, etc). The 'Fundamental Law of Active Management' states Sharpe is proportional to √N, where N is the number of alphas [7]. Hence, as long as an alpha is not perfectly correlated with another, then there is value in adding it to the set. ‘In 2010 we were producing several thousand alphas per year. By 2016 we had one million alphas. As of 2022 we have multiple millions of alphas’. And there is a stated ambition to get to 100 million alphas. While traditional quant finance mandates the importance of an economic rationale behind an alpha, the data-driven approach is lead purely by the patterns in the data.
Combination: Once alphas have been produced, they are intelligently merged together in a time-variant manner. Examples of signal combination methodologies include Mean Variance Optimization and Bayesian Model Averaging. “No?one?alpha is important. Our edge is putting things together, it’s the implementation.” “The idea is that with so many “alphas,” even weak signals can be useful. If counting cars in parking lots next to big box retailers has only a tiny predictive power for those retailers’ stock prices, it can still be used to enhance a bigger prediction if combined with other weak signals. For example, an uptick in cars at?WalMart?parking lots—itself a relatively weak signal—could combine with similar trends captured by mobile phone apps and credit-card receipts harvested by companies that scan emails to create a more reliable prediction.” ?
The automated process of data ingestion, processing, packaging, combination, and prediction, is referred to by World Quant as their ‘alpha factory’.
Conclusions
Lessons for the wider business community are,
One possible vision for the future is that all businesses become miners of data patterns and generators of signals, enabling optimization of their revenue generation factors.
References
Appendix A
This links back to Claude Shannon's 1948 fundamental information-theoretic Capacity Theorem [8]. The theorem states that any communication channel can be assigned a property termed "channel capacity", which acts as a constraint for the reliability of information transfer. In [1], the author shows the growth rate of such a system is the capacity of a hypothetical noisy channel, over which the system is getting information that distinguishes its probabilities from those of the wider system.
In the context of this article, the 'system' is a business, the 'wider system' is the business's competitors, and 'probabilities' is the information (data) being used by the business to make business decisions.