The Challenges of Centralized AI
Jesus Rodriguez
CEO of IntoTheBlock, Co-Founder, President at Faktory, Co-Founder, President NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.
Decentralization is likely to become one of the pillars that influences the next decade of artificial intelligence(AI). The friction between decentralized and centralized models is going to be of the existential challenges of the next years of AI. Continuing relying on centralized models is likely to increase the gap between large companies and countries with the resources to develop AI solutions and the rest of the market. The current centralized nature of AI models introduces a “rich get richer” vicious cycle in which only companies with access to large, labeled datasets and data science talent can benefit from the promises of AI.
Understanding the centralization challenges of AI solutions is far from trivial as they range from purely philosophical to practical implementations. If we visualize the traditional lifecycle of an AI solution we will see a cyclical graph that connects different stages such as model creation, training, regularization, etc.
My thesis is that all those stages are conceptually decentralized activities that are boxed into centralized processes because of the limitation of today’s technologies. However, we should ask ourselves, if software development on-demand has traditionally been a centralized activity so what makes AI so different? The answer is in the lifecycle itself. Each stage of the lifecycle of an AI application is vulnerable to the subjectivity and trust placed on a single party. From that perspective, the centralization challenges of AI expand across different dimensions:
The Data Centralization Problem
AI is not only an intelligence problem but a data problem. Today, large datasets relevant to AI problems are controlled by a small number of large organizations and there are not great mechanisms for sharing that data with the data science community. Imagine a healthcare AI scenario in which any participant in an experiment could contribute their own data with the right security and privacy guarantees. Decentralizing data ownership is a necessary step for the evolution of AI.
The Model Centralization Problem
Your favorite consulting firm selected a series of AI algorithms for a specific problem but how do we know they are the best ones for that scenario? Have they been keeping up with the constant flow of AI research coming out of universities and research labs? What if a community of data scientists around the world could propose and objectively evaluate different models for your scenario? Wouldn’t that be great? In my opinion, decentralizing the selection of models and algorithms will drastically improve the quality of AI solutions over time.
The Training Centralization Problem
One of the main problems of AI solutions in the real world is that the training of the models is done by the same groups that create the models themselves. Like it or not that dynamic introduces a tremendous level of bias and prompts the models to overfit very frequently. What if we could delegate the training of models onto a decentralized network of data scientists that will operate under the right incentives to improve their quality? Training is another aspect of AI solutions that is regularly hurt by centralization.
The Regularization-Optimization Centralization Problem
We deployed our AI model to production but how do we know is performing correctly? Is its behavior improving or deteriorating over time? Can hyperparameters be tuned on a different way to improve performance? Paradoxically, we rely on centralized processes for the optimization and regularization of AI models that very often used the same data scientists that created the models. Imagine if we could use a decentralized network of AI experts to try to find bugs-vulnerabilities and try to constantly improve our model. AI regularization and optimization are intrinsically decentralized methods that are forced into decentralized processes today.
As you can see, we shouldn’t be speaking of AI centralization in a single, generic term but as many challenges that are colliding to hinder the evolution of AI. The evolution of blockchain technologies as well as paradigms such as federated learning are slowly opening the door to more decentralized AI models and hopefully we will get there by solving not one but many of these problems.
Decentralization Friction Points
Despite the challenges of centralized AI, introducing decentralized AI models comes with its own set of challenges across several dimensions:
· Computational: Decentralized runtimes are not optimized for storing large volumes of data or performing computations over large GPU clusters.
· Cultural: Without implicit trust, how can users assert the validity of predictions even if they perform well against historical data?
· Financial: For a decentralized AI model to work, it has to have the right financial incentives for users and data scientists to engage in the creation and operation of AI models.
These challenges make the implementation of decentralized AI solution not very practical either. However, as the speed the space is evolving, it is likely that these challenges will become more manageable soon.