AI's Curse of Dimensionality
Mohit Sharma CGMA
Technological Innovation | Artificial Intelligence | Strategy | Enterprise Architecture | Storytelling | Research | Consulting | Keynote Speaker & Panelist | Investor
As artificial intelligence continues to evolve at a breakneck pace, it faces an unexpected challenge: the curse of dimensionality. This phenomenon, first coined by mathematician Richard E. Bellman, refers to the exponential increase in complexity and computational requirements as the number of dimensions (or features) in a dataset grows.
The Double-edged Sword of Data Diversity
In an expanding and growing universe, with increased access to computing power and devices, it was natural for the data sizes to expand. As technology became more industrialized and advanced, data also became more accessible and broad-based. While this wealth of information theoretically provides AI systems with more nuanced insights, it also introduces significant challenges:
In the early days of Generative AI, these aspects were heavily responsible for the hallucinations and other risks which came out. This allowed the developers to find alternative frameworks to resolve the issue.
Technological sophistication is both a blessing and a curse for AI
Our quest to establish more visibility across our processes and reach a 'single version of truth', compelled us to add more and more dimensions to the data. These additional dimensions were originally intended to support AI algorithms to tackle complex problems but it ultimately ended up contributing to the curse of dimensionality.
For example, in genomic research, the researchers usually work with datasets containing thousands of features (genes) but relatively few data samples. This high-dimensional nature of genomic data makes it challenging to identify meaningful patterns and relationships between genetic variations and disease outcomes. As a result, many early attempts at developing predictive models for complex diseases based on genetic data yielded disappointing results due to overfitting and poor generalization.
Similarly, in the finance sector, quantitative analysts often work with high-dimensional datasets containing numerous economic indicators, market variables, and company-specific metrics. The curse of dimensionality can lead to overfitting in predictive models for stock prices or risk assessment, resulting in poor performance when applied to real-world market conditions. This challenge has led to the development of sophisticated feature selection and dimensionality reduction techniques specifically tailored for financial data.
As a first hand example, I was recently asked to assess the AI capabilities embedded within a market leading product on Accounts Receivables automation. This was a very typical workflow, automation and orchestration platform woven together through APIs and deriving value by mirroring high-dimensional data sets from multiple ERP solutions. So, simply speaking if there are hundred dimensions in one ERP, and there are four ERPs in total, this platform will be compelled to onboard at least two hundred and fifty dimensions to cover common and unique dimensions. To top it all, they informed their customer pursuits, that we have AI into areas-where the sheer complexity of executing AI will lead to failure. I was helping out a client make some realistic assessment. To my surprise, when I spoke to the product team from this vendor, they referred to within dimension probability calculation as AI, not across the dimensions. To illustrate, their definition of AI included ability to predict which particular cash receipt has to be matched to which particular open invoice, which is within the same dimension. But when I asked, can you make predictions as to which particular invoice will have a failed match to receipt, they said its not possible since it depends on so many dimensions.
So what are the options to combat the curse
Thankfully, there are some sophisticated techniques available, to conscious development and product teams. I am listing two of them (more will come on the topic later) here for reference.
Feature Engineering
Feature engineering involves creating new features or transforming existing ones to capture the most relevant information while reducing dimensionality. For those still new to AI and ML world, a feature is a crucial concept that refers to an individual measurable property or characteristic of the phenomenon being observed or analyzed. Features serve as the input variables for machine learning algorithms, providing the essential information that models use to learn patterns, make predictions, or classify data. The key techniques pf Feature Engineering include:
Fine-Tuning
Fine-tuning involves optimizing AI models to perform well on specific tasks with high-dimensional data:
Conclusion
As we stand at the frontier of artificial intelligence, the curse of dimensionality looms like a digital Everest, challenging our most advanced algorithms and systems. But there is no option to either ignore it completely or get bogged down by it. There is a lot of work already done by developers and AI researchers, in the space, and more techniques including some on the hardware front as well, are coming up to address the challenge of high-dimensionality.
Assuming that data dimensions will continue to increase, the key lies in not making the same mistake of projecting within dimension AI as across dimension AI, like the accounts receivables solution company referred above. A balanced approach involves taking care of required dimensionality and a realism around how easy or difficult it would be to utilize these additional dimensions will be required.
-Mohit Sharma
This piece is written without prejudice towards any individual or company. Any sources referenced have been directly attributed and are owned by the respective third-parties. The insights I share are based on my own personal experiences on the journey I have been fortunate to live. -Mohit Sharma