Make Way, "Big Data" !
On the subject of building efficient Artificial Intelligence (AI) systems, the long-held belief that smaller firms barely stand a chance against data behemoths like Google, Walmart & Meta is being put to the test.?Can a ‘Data-Centric AI' approach temper the need for access to humungous amounts of data in this building process, thus making AI accessible to more traditional players such as manufacturers??
Andrew Ng (Landing AI) is attempting to shift the focus within AI.?The conventional model-centric approach focused on the code/algorithm or model development. Here, large amounts of data were collected and fed to models that, in turn, were repeatedly tuned (with largely static data) to improve performance.
The stage is now set for the data component of AI to pave the way for successful Machine Learning models in production environments beyond the tech giants.?However, such environments are often characterized by heterogeneous processes, relatively small datasets, and, in many instances, a shortage of requisite technical skills.?Data additions to the AI system in the hopes that it would retrain and improve its performance (working through the noise in the data) is clearly not an option.
Ng’s ‘Data Centrism’ focuses on the model’s feedstock – data.?More closely, data quality. The approach relies on improved data quality bringing down the need for copious amounts of required training data. This development also meshes well with today's advanced models that come mostly well-tuned ready for real-world use, and open source.?
领英推荐
?According to Forbes, “as investments in AI projects spread from Internet-based, consumer-facing companies to other industries, the models are typically trained by 10,000 or less examples rather than millions of examples. That is a very good reason to pay greater attention to the quality of the data”.?
Towards data quality, Data-Centric AI focuses on systematic data cleaning, methodical error correction, and consistent labeling, among others.?It is not a one-off step but an iterative one where the data lends itself to repeat changes in the model training process. The customization of the AI system in these traditional heterogeneous environments comes in the form of knowledge transfers from in-house domain experts to improve data quality. These, in turn, lower the need for copious amounts of data to feed the models and facilitate AI learning.?
This approach undoubtedly provides more opportunities for traditional firms outside the tech realm to bolster their AI armor and competitiveness.?It simultaneously, once again, demonstrates the need for AI algorithms to work alongside invaluable business knowledge to maximize the benefits of AI.??
Research Sources: Snorkel AI, Fortune, Forbes, Bernard Marr