Investment Process Re-engineering: An Introduction
Chibuzo Ivenso
Data-driven Investment Management | Quantitative Research | Risk Modeling | Investment Process Re-engineering
The last few years have undoubtedly been momentous for many industries, including finance. In the wake of the global pandemic and its aftermath, geo-political realignments, supply chain disruptions, new operational models, etc., are fueling many important emerging trends that continue to reshape the landscape. In particular, quantum technological leaps, which include significant advances in the capabilities of the so-called large language models (LLMs), a sub-field of natural language processing, have given rise to an equal measure of anticipation and apprehension, fueled, no doubt, by the increasingly frequent appearance of commentary such as these [1 ,2 ,3 ,4 ]. It is probably safe to say that the many trends we anticipated in a series of articles [1 ,2 ,3 ,4 ] on this platform a few years ago are well and truly underway. These articles formed part of our advocacy for 'investment process re-engineering' as a response to the significant challenges and opportunities posed to financial services providers by nascent technologies.
In this article, the goal is to present a conceptual overview of process re-engineering as it applies to the investment industry and, hopefully, convey a sense of its scope, importance and potential impact as a means to engender a broader appreciation of its value. We start by recasting this conceptualization within its broader underlying framework and will be aided in so doing by (re)defining key terms.
?
Data, data everywhere!
Across most industries, there are generally two kinds of positive responses to technological change. In the iterative approach, a business seeks to enhance existing processes with upgraded technology, making the individual processes themselves more efficient, but with only incremental benefit to the system as a whole. However, true transformation and innovation only emerge when businesses completely re-imagine the entire product/service platform in light of the new capabilities offered by technological advancements. Investment process reengineering seeks to provide the enabling framework for a transformative approach towards investment management in particular and financial services in general.
In our context, investment process reengineering is closely linked to the emerging idea of data-driven investment management. In a certain sense, investment management has always been 'data-driven' in that all its key methodologies, whether it be fundamental or technical analysis, active or passive investment management, etc., rely heavily on their own stylized interpretations of market, economic, and financial data for execution. What is new about this emerging concept is the centrality, modes of utilization, and governance requirements for data under a new paradigm. In this regard, it draws heavily from machine learning operations (MLOps)—and particularly its sub-fields, DataOps and ModelOps—which is an emerging field pertaining to the operationalization of the data-to-model-to-product journey in artificial intelligence and machine learning (AI/ML). MLOps is itself a derivative of the more established concept of DevOps in software engineering. As this perspective on the centrality of data is, in a sense, somewhat novel in financial applications, we will attempt to clarify terms and ideas, drawing heavily on concepts in these adjacent fields where these ideas are relatively well grounded in their own domains of application.
We start with a definition of data-driven investing as "a framework for operationalizing investment management processes as an integration of research, portfolio and risk management, and investment-operations workflows via a pipeline of auditable data operations and execution models to facilitate reliability, efficiency, flexibility, and extensibility", which is a bit of a mouthful that can, no doubt, be better articulated by sharper minds, but it will serve for our current intents and purposes.
With this definition in hand and assuming that it helps convey our agenda, the meaning of investment process reengineering in our context is immediate: it refers to the design and orchestration of component workflows and processes to operationalize a version of the investment management model that incorporates all the important aforementioned characteristics. The idea here is to devise processes for investment management (and financial services in general) that closely mimic those in machine learning and software development in order to derive parallel benefits, which, in tune with existing tradition, we will refer to as Investment PRocess Operations (IPROps).
?
Old bottle, new wine
DevOps is basically a set of tools, processes, and best practices for robustly and seamlessly automating and coordinating the entire process of generating digital products and services, bringing them from ideation to production, and maintaining and updating them through their service lifecycle. Over the last couple of decades, the development, evolution, and adoption of DevOps practices have radically transformed the software industry, with dramatic improvements in development cycles, reliability, quality, and diversity of digital products and services. In the articles cited above, we highlighted the need for a similar evolution in the financial services industry, arguing that the required tools were already at hand.
Like DevOps, MLOps essentially embeds a collection of objects (data, models, and products) within a similar framework of tools, processes, and best practices to significantly improve all aspects of the ideation-to-production cycle. These include tools, processes, and principles for data storage and transformation (e.g., cleaning and feature engineering), provenance and lineage tracking (both for data and models), meta-data management, process automation and orchestration (to coordinate the interaction of data and models across a series of predetermined steps that form an operational pipeline), deployment to production and performance monitoring, etc.
As we hope to demonstrate, a bit of reflection will reveal that investment processes share very similar requirements and patterns, which establishes the cross-utility and transferability of concepts and processes between the MLOPs and IPROps. By adopting the MLOps framework for investment operations, we immediately gain access to a set of powerful tools and practices that can immensely enhance investment workflows, making them more efficient, flexible, extensible, and robust. In particular, the capabilities delivered by the extensive tool sets for workflow coordination, monitoring, and scheduling, especially with regard to DataOps, bring incredible new potential to financial services. In fact, there is growing recognition of data as a major source of value generation for modern business operations. Its 2V requirements (veracity and value), along with increased complexity across its traditional 3V dimensions (volume, velocity, and variety), nevertheless present an ever-expanding challenge for businesses. As such, it is increasingly clear that businesses of every stripe must revisit the role and treatment of data in their operations to keep pace with the competition in the emerging landscape, and as a component of MLOps, DataOps is therefore coming into its own rights as a critical sub-discipline, with new principles, frameworks, and tools ever on the horizon. Given its alignment with the MLOps framework, IPROps therefore comes with batteries included for the investment industry in this regard.
Incidentally, the scope for various levels of process automation this framework unlocks is the key to the kinds of mass customization and personalization that we proposed in earlier articles as the holy grail for financial services. These capabilities could hardly be possible otherwise. No doubt, many quantitative-oriented investment firms (especially those that have adopted ML) will already have extensively incorporated elements of MLOps practices, though it is probable that many have yet to explore this approach to its full potential. The agenda here is to demonstrate the applicability of the MLOps paradigm to a much broader set of investment managers as a means of harnessing its immense benefits. In what follows, we propose a framework for doing just this. While we concentrate on the investment industry in this presentation, these principles will also carry over quite readily to a broader range of financial services providers.
?Back to the basics
The key link between the two frameworks (MLOps and IPROps) lies in the similarities between their components (data, models, and products) and, critically, their outputs. In AI/ML, the output is a model that emits a distribution over variables of interest, which enables inference in a wide variety of formats and across a broad range of tasks (e.g., segmentation, regression, forecasting, etc.) with varying degrees of sophistication. However, while this may not always be obvious, especially because they are traditionally cast as decisions on point estimates, the outputs of most financial models can be recast as distributions over decision variables of interest, with the end-product essentially being a simple or complex decision rule on the same. What is left is to show how investment processes can be 're-engineered' via IPROps to align with those of MLOps in order to harness the manifold benefits of the latter.
Given the parallels outlined above, it should come as no surprise that both paradigms share much in common. Indeed, in terms of actual data and model operations, IPROps is little more than a wholesale adoption of the extensive sets of tools and best practices that already exist in MLOps, which is a vast and growing field whose scope of application continues to evolve at a brisk pace. In a subsequent article, we will perhaps delve into a more comprehensive exploration of the most relevant tool sets in MLOps and the capabilities they bring to investment processes. Here, we focus on the unique aspects of IPROps which differentiate it from MLOps.
?
领英推荐
Shadows in the looking glass
There are three key areas in which IPROps diverges from MLOps, and these arise from important differences in the data-generating processes of their underlying phenomena. In ML, the phenomena of interest are ideally assumed to produce stationary data, which means that the underlying statistical distributions that ML models seek to discover are relatively stable. For instance, an ML application may seek to infer a quantity (regression/forecasting) or assign a category (classification) to a variable of interest using a collection of data surrogates; or it may attempt to generate realistic data samples from the (typically highly complex) underlying distribution inferred from such data using so-called generative models. In all of these use cases, the underlying assumption is that both the surrogate (i.e., training) and the inferred data come from the same underlying distribution, and the models will probably fail to faithfully reproduce the required outputs otherwise. While, in reality, changes in the distribution of the data (a.k.a., data drift) are often a fact of life for ML applications, this is considered a bug to be fixed using various mechanisms, based on the assumption that such changes are gradual enough to be neutralized by timely adjustments to model parameters. Also, the underlying physics of ML processes typically involve a low noise-to-signal ratio, which means that the phenomena of interest predominate over less useful artifacts in the data.
In financial applications, however, data drift is a feature rather than a bug and tends to happen at a much higher frequency than the typical ML application; not that this is desirable, but it is simply an inescapable constraint. Also, financial data are typically considered to have very high noise-to-signal ratios, meaning that extracting useful patterns from financial data often produces the proverbial needle-in-a-haystack dilemma. This means financial applications often need a substantial injection of contextual information and domain knowledge. While some of this can be incorporated from new perspectives on data itself (a.k.a. feature engineering), other aspects rely on observation and intuition, and all must contend with the interplay between the analysts' experience and accumulated knowledge in the collective.
Model management practices in IPROps (a.k.a. ModelOps) will also diverge slightly from MLOps. For instance, MLOps models typically follow a linear versioning pattern where model lineage is usually determined by performance on a given task across time. However, given the more dynamic nature of IPROps, we can speak of an amalgam of models, versioned not only temporaneously to cover the evolution of a particular model over time, but also across the functional dimension to reflect diverse model use-cases and fitness for specific market regimes. This enables the most appropriate set of models can be dynamically assessed and deployed from a model repository to align with environmental changes. ?
These issues have significant implications for execution of data and model pipelines under IPROps. Specifically, the focus of models in financial applications will differ considerably from their ML counterparts, as will certain aspects of their operationalization and the treatment of the data that feeds them. Thus, the divergence of IPROps from MLOps referred to earlier reflects the nature of the models themselves (in response to high levels of uncertainty in financial data), the modeling cycle (in response to rapidly changing data distributions), and the interactions between the analyst's perspectives, the model, and the data across the pipeline. In the following sections, we will present our approach to addressing these issues.
?The IPROps framework
As with MLOps, the IPROps framework has two important inputs: data and models. Data is exogenously imposed by the environment, though it is also true that the much greater prevalence and importance of feature engineering in IPROps relative to MLOps blurs this reality a tad. Models, on the other hand, are created by the analyst to make sense of or impose some stylized facets of reality on the data for the purpose of inference. Relative to MLOps, models in IPROps must be decorated and extended with additional features to make them amenable to the quirks of financial data and applications. Our approach envisages three components of an effective IPROps framework:
The core:
This is the fundamental module through which the analyst creates a descriptive proxy for real-world financial phenomena via observational rules or mathematical expressions. This corresponds to the typical understanding of a model in traditional usage with a few accoutrements. At this juncture, it is important to emphasize the generality of this concept. In our interactions, many are quick to associate IPROps with complex, opaque, and highly sophisticated models used at high-frequency trading desks, quantitative hedge funds, and other similar purveyors of high-finance. This would be a serious misconception! In fact, conceptually, this module can include virtually any sort of financial model. Econometric models for local or global macro or models of financial statements utilized in fundamental analysis? Check! Any of the variety of quantitative and qualitative models used in systematic investing? Check! Stochastic models of varying sophistication used by derivative/volatility traders? Certainly! Oh, by the way, here's a tree model for real options, coupled with a system-dynamics model for corporate finance applications! Not a problem. Ok, how about a full-blown portfolio of sophisticated machine learning models addressing any or all of these use cases? By all means!
The same goes for risk models and other sorts of models used in these and other application areas, and this module will also include all forms of validation practices that will be necessary to ensure the models are fit for purpose.
To reiterate, IPROps is truly general in its scope of admissible models and breadth of applications. Just because it adopts the MLOps framework by no means implies that it is restricted to using ML models. It will accommodate any sort of model used in any typical financial services workflow, and this is precisely the secret to its power! It allows analysts full latitude to express their views on the key variables that they believe define the states of the environment with respect to the system under study, typically (but not necessarily) based on data. There's only one extra requirement: such models emit distributions rather than point estimates. In reality, this 'restriction' is hardly worth the name, as virtually any model can be recast as such, if not natively, then by generating scenarios on input variables. The reason for this requirement will become clear shortly.?
The integrator:
One of the key requirements for effective modeling under IPROps is the acknowledgment of the centrality of model risk in financial applications, in recognition of the inherent dynamism and uncertainty of financial data-generating processes. One solution to this problem involves, among others, identification of data regimes, explicit incorporation of a temporal-dynamic dimension into modeling, and (as in ML) model ensembling to address variation. However, whereas model ensembles in ML focus on data, in addition to this, there is also an emphasis on ensembling models across operational components (e.g., return, risk, liquidity aspects, etc.) and identifying data regimes under IPROps.
Conceptually, this means that robust modeling practice will incorporate attempts to model different aspects of the data-generating phenomena separately (but jointly in the ideal scenario), with some effort aimed at identifying regime dynamics, so that models can be adjusted and/or deployed dynamically to align with frequent changes in real-world distributions. Thus, in IPROps, there is an emphasis on diversity and modularity of models, where simplicity (at least in the conceptual sense) and relevance of the individual models to specific aspects of the process being modeled will also be of considerable interest. Furthermore, given the innate ambiguity of financial phenomena, such models should ideally incorporate some measure of uncertainty, hence the need for distributional rather than point estimates. Finally, there's a need for a principled approach for injecting prior knowledge from human intuition and experience to augment model outputs where deemed necessary.
How do we make sense of an ensemble of models, each of which produces (potentially very different) distribution estimates of a variable of interest, with all these additional requirements? This is where a pooling framework comes into play. We utilize Meucci's pooling frameworks in their various incarnations. Pooling belongs to a Bayesian family of techniques that are used to harmonize multiple different distributions induced by different views/opinions on any subset of a group of uncertain variables, based on the principle of minimum distortion from a baseline joint distribution of the whole. In a certain sense, it can be considered a much richer and more general extension of the famed Black-Litterman approach to view-blending in asset allocation. Meucci's pooling framework(s) give analysts the ability to integrate a wide variety of prior and extrinsic knowledge and assumptions into a given primary view (e.g., market consensus), allowing these to be incorporated in a principled fashion. From this perspective, it is the key to both dynamism and flexibility, since it allows the outputs of models parameterized by data to be readily combined and updated as more data becomes available and also accommodates analysts' subjective interventions with the data. By enabling the analyst to efficiently generate a wide range of scenarios in a principled manner, based on subjective and/or data-driven views of the uncertain future, pooling techniques not only serve as a unifying scaffold for the entire modeling framework, tying the disparate models together, but are also the key to bringing the scope for automation inherent in MLOps into the IPROps framework.
This highly underrated (IMHO) tool in some ways forms the centerpiece of IPROps because it is the element that enables the entire system to be truly data-driven! While its implementation is fairly mechanical, it is a very powerful machinery for allowing both the analyst and the data to dynamically interact with models to produce enriched (and thus potentially more robust) views as a tangible output.?
The executor:
This component is the decision-making engine of the financial models in IPROps. It takes as input the (blended) distributions from the integrator and executes rules-based decisions, which may be as simple as merely thresholding on some distribution aggregates or a more complex optimization routine involving entire distributions or parts thereof. While it is perhaps the most straightforward and mechanical of the three components, this is not to suggest that it is trivial. The sophistication of this aspect of the model could be quite significant, potentially incorporating complex rules that interact across multiple modeling dimensions (e.g., risk, return, liquidity, etc.). It also encompasses the feedback loop between the model and the environment, which enables dynamic and interactive decision-making, as we shall subsequently elaborate.
In a follow-up article , we will explore how IPROps fits into investment management practice, and the enhancements it can bring to the industry.
?