Last one [and a half] year R&D report from ParallelDots AI team
2020 [and the first half of 2021] was a black swan. Families, Societies and companies had to face things they couldn't have conceived of. In this post, I will try to highlight how ParallelDots AI team has been adapting in this period and building for the next generation of our retail AI solutions.
ParallelDots went into the full remote work mode in February 2020 and since then the team hasn't met physically for a day. We had always been a very tightly knit unit before that and thus the first few weeks were totally spent in building a remote working culture. We had to think about better communication and a very different ownership structure. Given that the company was also dealing with a business shock, these weeks were hard. I personally am proud of the way our team handled the pressure and not just adjusted but evolved to become the top technology churning machine it has always been. Just a few weeks of tweaks and we were back being awesome.
Challenges for AI team [circa March 2020]
[You might find the 'why' of different AI algorithms and systems we are building as boring, I know because I would have ;) , in case you are just interested in the 'how', or all the new cool tech and algorithms section, move down to the section 'New Systems and Algorithms']
ParallelDots AI team role is solving different problems which bottleneck our AI training and deployment infrastructure at ParallelDots. You can divide these challenges into : A. AI training and Accuracy Bottlenecks [or Research Bottlenecks] and B. Deployment/Inference Bottlenecks [or MLOPS Bottlenecks as we call them] . In the beginning of 2020, while our AI technology was already processing over a million images per month, some challenges that we were expected to solve to make it scale up were :
Now that I have bored you with the details of challenges we were trying to solve, lets's come to the interesting part. Our new MLOPS platforms and algorithms.
New Systems and Algorithms
Let me introduce you to my new friends, some awesome technology systems and AI algorithms we have developed and deployed over the last time to tackle the bottlenecks.
Mobile Product Recognition AI or Mobile Shelf Recognition AI
We have built and deployed not one, but two different types of AI algorithms on mobile devices. You might have seen our extremely viral posts few days back where we demoed mobile phone billing and talked about offline shelf audits.
Essentially, these AI models are scaled down versions of the models we deploy on cloud. With some loss in accuracy, these models are now small enough to run on a phone GPU [which is much smaller than a serve GPU]. Tensorflow's new mobile deployent frameworks are what we use to deploy these models in our OOGASHOP and ShelfWatch app respectively.
Autoscaling Cloud AI Inference
When the shops open at around 11 AM [11AMs for different timezones that is, wherever in the world our clients have their salesforce or merchandizers ], our servers face an insane load of merchandizers uploading photos on our cloud to process and tell them about their retail execution score. And then after 11 PM when the retail stores close, we hardly have enough AI inference workload. While Lambda like autoscaling has been introduced by many providers, we wanted a cloud independent autoscaling technique for our AI inference infrastructure. When there are more images in our processing queue, we need more GPUs crunching them, otherwise just one or maybe none. To do this, the entire AI inference layer was moved to Docker, Kubernetes and KEDA based architecture where arbitrary number of new GPUs can be spawned based on the workload. No more a tightrope of trying to manage company's SLA and saving $$ on the costly to run GPU machines.
Bettering the Shelf Object Detection Algorithms
We had been using simple Faster RCNNs trained for shelf object extraction earlier : Simple Object Detection Baseline Paper . It worked well for many usecases. but we needed more state of art approaches. In 2020 our team discovered a new method to use Gaussian Maps to get state of the art results. This work [later published at BMVC, one of the top Computer Vision conferences BMVC website ] helped us get not just satisfactory but the best possible results on a shelf object detection.
The trick essentially is to use gaussian maps training as an auxiliary loss to object detection. This makes the boxes for different products much more precise.
Another question we have been trying to address for a long time in terms of shelf object detection has been, now that the need to recognize products has been moved to a downstream task and the task is to draw boxes over all possible products, is there a way for using the noises and distortions contained in a huge unannotated dataset to better shelf object detection. In a recent work, [mentioned at RetailVision workshop at CVPR 2021 Retail Vision Workshop ], we use our humongous repository of unannnotated shelf images to better the accuracy of shelf object detection task.
Psuedolabel based student training is a trick that we have used in multiple fields, not for shelf object detection.While other self learning techniques require large batchsizes to be loaded on GPUs thus making it hard for a company like limited hardware like ParallelDots to try them out, pseudolabels is what we have adapted as our trick to do single GPU self learning.
Bettering Classification Accuracy
We have used multiple tricks in the past to train accurate classifiers with high accuracy. Bag Of Tricks for Classification . All boxes that the shelf object detector extracts from a shelf image are passed through this classifier to infer the brand of product.
However, with the frequently changing catalogues of shop, our product classifier needs to evolve to do things a bit differently. Training a classifier is resource intensive, with products quickly adding or removing from catalogues of stores, we need a classifier that can be trained fast and be more accurate or at least as accurate as the methods involving finetuning of the full backbone. This sounds like having ones cake and rating it too, and that is what self learning techniques have been shown to do in Deep Learning. We have been trying to use concepts of Self Learning to create classifiers which can be trained very lightly.
The trick we use here is employing the huge repository of retail product images we have [both annotated and unannotated] to train a representation learner, whose output can be fed to a simple Machine Learning classifier for training. Such learnt feature representations work quite well. How cool is training a small logistic regression classifier to classify retail images. Unfortuantly, we have over 20 times more images for such tasks, therefore right now our accuracy achieved is limited to the limited hardware infrastructure to do such self learning and still we beat state of the art on many [not all] datasets.
Size based inference on Shelf Images
While we have been detecting brands of different products seen in shelf images, a recent spec that we have tried to solve is to reason about what size variant of a product is the product that we depend. For example, while Computer Vision pipeline detects a Lays Magic Masala on the shelf and classifies it as Lays Magic Masala, how do we know if it is 50 Gram variant or 100 Gram variant or 200 Gram variant of the product. We thus include a third downstream task to guess the size variant of the shelf. This pipeline takes the different boxes extracted from the shelf, their brands and create features which can be used to guess the size. As is obvious, you cannot use bounding box coordinates or area for such reasoning as images can be taken from any distance. We use features like aspect ratio and area ratio between boxes of different groups to infer size variant.
A lot of feature engineering tricks are used to train the two variants of the reasoning task : Using XGBOOST over binned features and using a Neural Network over Gaussian mixture model derived features.
Reasoning about Point of Sales Materials
When you walk into a retail store, you would notice different POSM materials : shelf strips, cutouts, posters, gandolas and demo racks.
While we have been using Deep Learning based keypoint representation matching for verifying the POSM presence in an image, there was a task to reason about POSM part by part. That is in the above example for example, we might be needed to check if the product photograph towards the right in the ideal shelf strip in present on a real world placement or not. We call this "Part" detection after POSM verification.
Essentially since POSM changes very fast weekly/monthly, you cannot ever get a lot of data to train algorithms for each POSM. So we need algorithms that train in a way on existing datasets so that they can be applied on any dataset. That is our aim with the recent work of self attention network for POSMs. We use matched keypoints [on ideal POSM image and real word image] and their descriptors [from both images] as input for each part separately to determine exact presence.
A Sentiment Analysis API that works on any domain data
When training a model to be deployed as a sentiment analysis API, you cannot really get data from different domains annotated. For example, the previous sentiment analysis model we had was a large language model finetuned over 10-15k odd tweets we annotated in-house. So the algorithm has hardly seen sentiment expressed in different domains while learning. We tried using Self Learning to make our sentiment classification algorithm sturdy to domain change. Take 2 Million + unannotated sentence, run a older version of classifier to create pseudolabels and train a new classifier to learn these pseudolabels and boom.. you have a sentiment classifier which is much more domain robust, while its accuracy in the initial domain stays the same. Sounds too good to be true, check out our work :
Making a state of the art method to detect targeted sentiment
For us, in NLP API business, targeted sentiment is when you have the sentence "Apple wasn't that tasty, but orange was good", a classifier returns negative when it gets input "apple" and positive if it gets input orange. Basically, sentiment directed towards an object in a sentence. We have developed a new method that detects targeted sentiment and which will be soon available as a NLP API. The research field corresponds to Aspect Based sentiment analysis and our recent work gets state of the art results in multiple datasets, just by finetuning an architecture comparing contextual [BERT] and non-contextual [GloVe]. The sentiment is hidden in context somewhere, right ?
Onwards and Upwards
Hope you liked the new technology that we have developed last year. Very happy to answer questions if you have any. We continue to develop new and exciting technology and are working on some new cool Machine Learning problems like Graph Neural Networks for Retail Recommendation, Out-Of-Distribution Image Classification and Language Models. We are hiring as well, write to us on [email protected] or apply on our AngelList page to join our AI team. You can apply if you want to be a Machine Learning Engineer, Backend Developer or AI Project Manger. ParallelDots AngelList