Ghanaian Food Vision model
Over the past few weeks, I delved into the field of computer vision, exploring the various neural network architectures used for tasks like object detection, segmentation, and classification. One paper that particularly caught my attention was "An Image is Worth 16x16 Words," which introduced the Vision Transformer (ViT) architecture by Google. This paper explained how we could allow CV models to learn from images in a way similar to how large language models (LLMs) process text sequences.
As I learnt about advancements, I became excited about building my own computer vision project. In this project, I built a food classification model that could recognize Ghanaian dishes, using the Devkyle Ghanaian food dataset on hugging face. For this, I leveraged ConvNeXt, a convolutional network (CNN) architecture that improves the performance of traditional CNNs by adapting some design principles from the transformer model (Vision transformer). In this article, I’ll walk you through the entire process, including why I chose ConvNeXt, the challenges I faced, like overfitting, and how I overcame them with data augmentation and advanced techniques like early stopping and scheduling.
The Inspiration: The Original FoodVision model
Before diving into my food classification model, I want to give a shout out to the project that inspired me: FoodVision. This project, initially developed to classify just five food categories—sandwich??, pizza??, pasta??, doughnuts??, and burger??—was built using the Vision Transformer (ViT) B16 model.
The original FoodVision was impressive for its simplicity and performance, focusing on demonstrating how Vision Transformers could be applied to image classification tasks. By leveraging ViT's attention mechanism, which breaks an image into small patches and treats each patch like a "word," it allowed the model to achieve high accuracy while capturing both local and global patterns in the images.
However, as exciting as FoodVision was, I wanted to take this idea a step further. Rather than just classifying five exotic food items, I aimed to:
This new project builds on the foundation laid by FoodVision but with simplicity, scalability and a broader range of foods, making it more useful for real-world applications like restaurant menu scanning for tourists or food/calorie-tracking apps.
Why the Upgrade?
The motivation behind building a model for African cuisine was simple: to give a better representation of African cuisine in vision models and to test the boundaries of what could be achieved with more diverse data. While the original project was an excellent starting point, I knew that scaling up the model to classify more food categories would present unique challenges, such as:
Through these upgrades, my goal was not just to recreate what FoodVision had done, but to enhance it, making the model more robust and scalable to real-world scenarios.
Dataset: Devkyle Ghanaian food dataset with 30 Classes
For this extended model, I worked with the Devkyle Ghanaian food dataset, a small collection of images featuring 30 different types of food. The Ghanaian food dataset provided a solid foundation for training, with diverse and challenging examples across all 30 categories. This diversity pushed the model to capture fine details, making it a great learning experience in balancing data diversity and model performance.
Dataset Details: Image Distribution
For this project, I organized my dataset into two subsets to ensure a well-rounded evaluation of the model's performance:
This structured approach to dataset distribution ensures that the model is adequately trained, validated, and tested, leading to a more reliable assessment of its performance in classifying the 30 food categories.
Training the Model
Thus, in my training loop for 15 epochs, which took about 20 minutes, the model improved its ability to recognize the food categories. However, I encountered some challenges, particularly with overfitting. This meant that while the model excelled at classifying training data, it struggled when presented with new, unseen images.
Handling Overfitting: Data Augmentation and Early Stopping
To reduce overfitting and make the model generalize better, I used:
Performance Metrics:
Here’s how the model performed after 15 epochs of training:
Although the model isn’t perfect (about 83% accuracy overall), it performs well in predicting the majority of food items accurately and quickly! This is a significant improvement from my original model, and I plan to keep refining it to improve these metrics further.
How Can the Model Be Improved?
While the model is performing well, there are several ways to enhance it further:
Deployment on Hugging Face
Once the model was trained, I deployed it on Hugging Face to make it accessible to everyone. Hugging Face Spaces offers a user-friendly interface where anyone can test the model in real-time. You can try it out here: Ghanaian Food vision Hugging face space??
What's Next?
In the future, I plan to expand the model to classify even more African food items, making it even more versatile and useful. Additionally, I will explore other model architectures beyond ConvNeXt to see if they can improve accuracy and performance. This exploration could lead to discovering new techniques and strategies in food classification, ultimately enhancing the user experience and practical applications of the model.
Conclusion
lt was a rewarding experience to build an African food classification model. This taught me the importance of overcoming common machine learning challenges like overfitting and tuning hyperparameters. I learned how to tackle common challenges in machine learning, such as overfitting and adjusting hyperparameters for better performance.
If you’re working on similar projects or want to collaborate, or you just want to learn how I did all of it, reach out! I’d love to connect and hear your thoughts!
Graduate Student at USF_Energy Systems Management | Clean Energy | Power Production | Mining
3 个月I would love to learn more! Kenneth Kwame Dotse ????
Software Engineer | Machine Learning Engineer | Mathematics Graduate | AI Enthusiast
3 个月As someone passionate about food and machine learning, I'm definitely going to try this out too! ??Great work!
Aerial Robotics (Multi-agent UAV Swarm) enthusiast ? ? Software Engineer ?
3 个月This is really impressive bro??????