Are we learning AI the right way?
Most AI courses focus primarily on model types, training techniques, model selection, and hyperparameter tuning. These courses often provide well-curated datasets, such as Iris, MNIST, or dog-cat images, for experimentation. However, I've noticed a significant gap in the curriculum: data preparation. Few courses comprehensively cover the techniques, processes, and tools involved in preparing high-quality data.
In real-world implementations, clean and curated data is rarely available. This challenge hinders organizations' Return on Investment (RoI) from AI initiatives, largely due to the lack of skilled professionals trained to address this issue. This highlights the importance of "Data Centricity" and the need to shift from traditional "Model Centric-AI" to "Data Centric-AI."
Recently, I came across Curriculum Learning, an incremental machine learning methodology where models learn from data fed incrementally over multiple iterations. This approach has shown promising results, but relies on human input for difficulty grading, emphasizing the need for human involvement in AI development.
Techniques like Curriculum Learning and Confident Learning (which modifies data with caveats) rely on common data analysis methodologies, including data augmentation, feature engineering, outlier detection and removal, and data labeling. Notably, popular ML models like ChatGPT and Dall-E rely heavily on human feedback to ensure high-quality training data.
However, this human-intensive approach is unsustainable. We need to develop AI models that can train on clean, well-curated datasets without extensive human intervention. Fortunately, Data-Centric AI is gaining traction, and we can expect automated models to emerge in the coming years, addressing data implications and providing more reliable AI models.
As we continue to learn about AI, it's essential to prioritize data management and data engineering aspects. By bridging the gap between AI and data, we can transition into the realm of "Data-Centric AI."