Pizza Quality Analyser — An AIML based, hardware and software intelligent decision support system — Challenges & Solutions

In the present time of fast evolving technology, quick food preparation and quick delivery is getting faster and faster. Quality is something which shouldn’t take back seat and especially in the food industry where hygiene and quality of food is of paramount importance.

Human intelligence can be described as taking apt decision based upon perception formed by our senses. Due to technology advancement, both in the hardware and software, machines are making attempt to mimic human intelligence and helping in taking the decision.

Almost a year back, using computer vision technique, developed an hardware/software intelligent decision system that visually inspects the prepared pizza through the camera eyes and tells whether Pizza is good to deliver or there are quality issues. The system detects the size of pizza(regular/medium/large), identify individual toppings on the pizza(Approximately 14 toppings), type of pizza(Veg/Non Veg), map the pizza prepared with POS order number, count the topping in the each virtual slice, check if the toppings on pizza is as per the prescribed food portioning chart and provide inference on the monitor so that person at the pizza cutting table can decides the fate of the pizza, whether to throw it in the bin or will cut it into slices and box it so that it can be delivered to the customer.

Building the working solution for the real problem like this is way different then writing the code on Jupyter notebook or developing the e-commerce application or building mobile app or similar other traditional software development. While developing such application, encountered various practical problems and learnt the solution by hard way and performing various small experiments. When success achieved on those small experiments then that activity was merged to main solution. In this article, will be sharing my experience, challenges faced & the practical approach that I have taken in developing a hardware/software system. Won’t be discussing in details on process such as object detection as on these topics, tons of articles are already written and available on internet. What I will be sharing here is a practical implementation approach I have taken which I think is not that much discussed on the web. May be this article will help other enthusiasts who wanted to build cost effective solution like this for the real business problem they encountered.

I would be first discussing on the hardware that I considered and then describe about the software.

Hardware:

  • Deep learning model requires good system with good computation power to run and if the solution is to be deployed on multiple location then cost will be multiplied by that many times. On the other hand to avoid the latency issue the detection has to happen on the edge. To have an economical solution, Raspberry Pi, a credit card size computer was considered. It comes at very affordable price, has a powerful 1.5GHz 64bit quad-core ARM Cortex-A72 processor with 1GB, 2GB and 4GB RAM variants. Anyone planning to use this device should use 2GB RAM or above as 1GB is not that sufficient to run the Tensorflow lite models. This is because, minimum RAM to run GPU itself is 128MB, there are other applications that are running on this small PC which also require memory. Since Tensorflow models are pretty heavy models therefore it will swap away around 0.5GB of RAM . This leaves very less memory for the system. Would suggest to use 2GB or more in case any one has plan to run tensorflow COCO models on Raspberry PI .
  • Raspberry PI is a robust machine but its CPU heats up quickly during the inferencing and it will be worse when performing continuous inferencing. CPU temperature goes above 82℃ and then the system throttles to cool off the CPU. To take care of this problem, installed heat sink on CPU but the CPU temperature only dropped by 3–5℃. I then had to install the external fan after which temperature remained at around 53℃ — 58℃ and Raspberry PI performed optimally. In my case, I had to run the Pi continuously for around 12–14 hours in a day and hence keeping the CPU temperature as low as possible was necessary.
  • Selection of camera was another important key decision. Deciding between USB, IP and other cameras available in market was another challenge. CCTV cameras are low in resolution and USB cameras with high resolution were expensive. Pi cameras in comparison have much better resolution and also comes with very low price as compared to the standard CCTV or USB cameras. Raspberry with Pi1.3 Camera(5MP) installed at the distance of around 1 meter height from the pizza cutting table was good enough to capture good pizza image for the inferencing. The cost difference between the Pi1.3(5MP) and Pi2.3(8MP) NOIR camera was around 3times therefore if in your case if object is distant away and you require better picture of the object then go for higher resolution camera.
  • Another important thing is to be careful about is the Power supply to Raspberry Pi. It should be stable and continuous. If it is not optimal then again the CPU speed is throttled and the application will not work properly and impact inferencing. If the plan is to attach HDMI, PiCamera and also I/O pin to source power to fan then make sure you use Power adapter that provide 5.1Vol 3 Amp current or else your Pi won’t work well. Like CPU throttling due to temperature issue, I had to go through the under voltage throttling issue as well. Pi was sometime not inferencing and sometime it was very slow during under voltage but I was not able to make out what went wrong until I ran vcgencmd get_throttled command and came to know that Pi was getting throttled due to under voltage. After changing the adapter with good one, this problem got solved. Would even suggest anyone to power external fan on Pi from different power supply and avoid using Pi I/O pins for power source.

Software:

At the high level, the software application is divided into few sub modules. Before talking about the challenges faced and approach taken while developing these software modules, sharing with you a very brief idea about these application module functionalities.

  1. Core Application — developed deep learning based application using Python, OpenCV & Tensorflow, that can
  • Detect pizza under the camera and feed the pizza image to CNN model for inferencing
  • Marks the identified toppings on the image, count the detected toppings, find how those are placed on Pizza, check if any air bubbles are formed, etc.
  • Extract the Pizza portioning information from the POS API’s for identified Pizza
  • Performs the quality check by comparing Pizza portion information received from POS API with the Toppings detected information.

2. User Interface — developed using PyQt5, that

  • Display the camera view and the detected pizza image with identified toppings and other inferencing result.
  • Buttons are provided to control the application in manual mode.

3. Model Training — using various open source utility/scripts for image annotation, generate TF records, train model, extract model, etc.

  • Collected thousand of Pizza images for annotation & training. One should collect as many image as possible for training as it will improve object detection accuracy.
  • Annotate the images using the utility LabelImg
  • Create the test and train dataset
  • Perform numerous test before selecting the right COCO model that can be retrained for Pizza topping detection in my case
  • Execute the tensorflow train utility to execute the training on the annotated image dataset
  • Export the baked model and deploy to Core Application

4. Web based application — developed using SpringBoot, Thymleaf template. This module was build to

  • Sync the results(identified details of each order along with image with detected topping) from core application on Raspberry Pi to the central windows server.
  • Responsive web application built using Springboot to show the pizza orders and the inferencing result for manager reviews and trainings.

In the below section will be discussing high level challenges that I faced in building these software modules and how those were taken care at each module level.

Core Application: installed on Raspberry Pi could run in manual as well as in automated mode. In manual mode, user has buttons in GUI to start application, capture frame and perform inferencing but in automated mode, challenge was to use camera eyes to automatically detects Pizza when comes under the camera view and perform inferencing automatically. In the automatic mode, detecting whether pizza is placed under the camera was bit tricky and had taken following approaches.

  • Camera pointing to pizza table continuously compare the two frames taken at 30 ms interval.
  • If there is change in subsequent frames then find the largest contours in the frame and check whether the contour area is approximately same or more in size than regular size Pizza and less than large size Pizza.
  • Initially tried OpenCV HoughCircle function to detect whether there is a circle like object in the image however this method of detection slowed down the overall process as detection of circles like object in the image was happening with the speed of approx 1 frame per second which was very slow. Would suggest anyone to avoid using HoughCircles for detection where one has to do real time detection on good resolution images.
  • Instead of above HoughCircle approach for detecting the Pizza, took another approach and it was to identify the horizontal and vertical ends of the contour and then calculate the horizontal and vertical euclidean distance between these points.
  • Assumption was taken that if the height and width of the identified object is equal to or more than regular pizza width and height and less than or equal to large pizza width and height then there is high chance that Pizza is placed under camera for inferencing. This approach took care of falsely detecting human hand or any other object coming under camera view.
  • Application then save the pizza image and stop camera to detect further. After that, process of identification of Pizza topping commences.
  • From the big size frame of 1920*1440 resolution, extracted only part of image that has largest pizza contour area in that frame.
Extracted 640*640 size image.
  • With this method, other objects in the frame that are not required for inferencing were cut out. (Please note that the bottom portion of pizza seen cut in below frame is also cut in the original high resolution frame.)
  • Even in the extracted image still there is onion toppings left on the table. If we send this picture for inferencing the application will identify Onion along with Sausage which we do not want.
  • Therefore next task was to extract just the pizza Image which is the largest contour in the new saved image so that the toppings on the table or any other objects other than pizza is removed before sending the image for inferencing.
  • There were couple of approach in identifying the Pizza in the image. Either could have used CNN model, which would be computationally expensive on Raspberry Pi and would take time in detecting. Another option was to use OpenCV functions which is quick in identifying the Pizza contour using colour space such as HSV or L*.a*.b* but these methods are not so accurate in different lighting condition and impacts the Pizza contour detection.
Extracted Pizza from the Original Image
  • To take care of lighting issue, attached two constant light source to the box that also house Raspberry Pi and the camera and this entire custom made box was fixed at approximately 1 meter above the pizza cutting table.
  • Using the utility, found the upper and lower HSV threshold values, when entire Pizza contour were detected correctly.
  • With the help of discovered HSV values, identified and extracted the Pizza region from the frame and on the fly created a new image of 640*640 size with black background and extracted pizza image in the foreground.
No alt text provided for this image
  • This image is then passed to Model trained for detecting pizza toppings. The model returns the detected topping classes along with the bounding box locations and its probability.
  • Based upon this information tensorflow visualization function creates bounding box on the identified toppings on the pizza.


No alt text provided for this image
  • This image is then stitched back to the same location of original image from where pizza image was extracted. With this technique, toppings on the pizza was detected and avoided detecting any undesired objects on the table.
  • Based upon the various identified toppings combination, name and type of the pizza is deduced.
  • Width and Height of the pizza help in identifying if it is regular/medium/large pizza.
  • Pizza image is divided into 4 virtual slices from the centre and then toppings in each slice area are counted. This helps in identifying if the toppings are evenly spread across each slice.
  • Cross check against the portioning chart, if the no of toppings/quantity of the toppings are right or not.
  • Also check if there are any big air bubbles formed in the pizza. Similar to topping detection training, AI model is also trained with various Pizza images with big air bubbles.
  • After analysing all the above & few others points, display the result to the screen suggesting whether pizza is good to deliver or should prepare order again.
  • Below image is of core application screenshot. In the top left of the screen, camera view is shown. After pizza detection, it shows the pizza image captured and after inferencing is completed it shows pizza image with toppings detected and inferencing result. Below screenshot is after inferencing is completed. On the right side of the screen, various details such as POS order number, detected pizza name and other details of detected pizza is displayed. In the bottom part of the screen, application controller is provided in the form of buttons.
No alt text provided for this image

Model Training: Object detection training process is standard and many articles are available on internet hence won’t be describing the process of object detection. Will only discuss such as, Tensorflow model that was considered, utilities used for annotating, training, extracting and deploying the model.

  • As there was a need for light weight and fast inferencing model therefore used SSD mobile net from the Tensorflow model zoo. Even though it is light weight, still accuracy of this model is not bad. Also the toppings on the pizza to be detected are small in size therefore used AI model that takes input image of size 640*640. While experimenting, models taking smaller image size inputs were not detecting the toppings and models taking higher size images were too slow in providing inference on Pi. Link to download this model is below https://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
  • The config file for above model can be downloaded from the link below. https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config
  • Above model is generated with Tensorflow v1.12.0 and I have checked it work on Tensorflow v1.14.0 also. Tensorflow themselves are not giving guarantee that it will work with version other than TF 1.12.0. If you have different version of Tensorflow other than these two then you have to check by yourself.
  • Image annotation can be done using LabelImg. As you have to annotate thousands of images therefore it will be long and tedious process. Be careful while naming objects of interest. If you make mistake in names while annotating then your detection will be inaccurate and it will be difficult to find out where you have made mistake in thousand of the images. Hence be extra careful at this stage.
  • Image dataset can be split into train/test image dataset in 80/20 proportion and generate the tfrecords respectively.
  • Modify the config file with train and eval records location, num of examples, num of classes, model file location, etc before execution.
  • During the training, keep checking whether loss is continuously decreasing, in case you find that loss is not reducing then it seems learning rate is too high. You can play with learning_rate_base in config file and then carefully watch the loss and accuracy at every epoch.
  • Once the loss remains constant for 10–15 epoch’s and does not reduce further and also there is no significant improvement in accuracy, you may then stop the training.
  • To confirm, you can refer the Tensorboard scalar section as well before stopping the training. Check the training Loss and Accuracy charts to confirm findings. Also refer Tensorboard, Image Summary section to find the detection on test Images. If object is identified in the test image as expected then export the model for inferencing.
  • When this baked Tensorflow model was directly ran on Rasberry Pi, it took 3–4 minutes to show the result. This is obviously not practical and therefore had to convert the model to Tensorflow lite which is much lighter and run faster on the edge devices like Raspberry Pi.
  • To run the model on edge devices, export the lite model using export_tflite_ssd_graph.py utility. This utility is available in the tensorflow research folder. Point this utility to Model checkpoint path and extract the frozen model.
  • Use the toco or tflite_convert converter & optimisation and convert to tflite model. Tensorflow 1.9 toco utility is deprecated but still toco utility works well for the models generated using Tensorflowv1.14.0.
  • While using toco converter, be careful in providing the parameters. As the input image was 640*640 therefore provide input_shapes parameter as 1,640,640,3. Also inference_type should be FLOAT and not QUANTIZED_UINT8 datatype. With QUANTIZED_UINIT, TFLite model will be generated but the model will not detect toppings.
  • TFlite model is much faster and the topping detection time was reduced to 4–5 seconds and overall process(i.e. Pizza detection, topping detections and showing result on screen) was completed within 5–6 seconds.

Web Based application:

? Any web based technology could have been used to develop as it is simple site to display the records saved by the core application. I used SpringBoot to build this application that display the records. SpringBoot makes it easy to create stand-alone, production grade web application in no time.

? Django is also a very good python based web development framework. In my recent projects I am using Django and in future will migrate this Springboot web application to Django web application.

No alt text provided for this image
No alt text provided for this image

Summary:

Above integrated software & hardware solution can help in monitoring the quality of the pizza. As Raspberry Pi is used as edge computing device therefore overall solution cost is greatly reduced but inferencing speed is impacted. Initially, I had low confidence on Pi & thought that it will burn after few months but even after a year of running for 12 hours everyday, Pi is showing no issues.

One can argue why not having Client-Server solution in which capturing the image locally and sending the image to server on cloud for inferencing. I think, this solution would be successful if one can ensure availability of good bandwidth with no connectivity downtime. Practically, when you have stores at remote location with unreliable connection then inferencing at server may not be reliable.

I know there is further scope of improvement in the inferencing accuracy and the speed. Accuracy is getting better by training the model with more and more pizza images but inferencing speed on Raspberry is something that I will be working on in future along with migration to TF2.x. In one my other computer vision project, I have tried OpenVINO tookit on Ubuntu machine and the inferencing speed improved considerably. On Raspbian as well, I am planning to do experiment with OpenVINO and check if I am getting the better inferencing speed than TFLite. My objective is to reduce the inferencing speed from 5–6 sec to 1–2 sec.

Well in this article, I tried to discuss some of the challenges that I faced and how I resolved. Hope this article will help anyone who is attempting to develop similar type of computer vision solution on Raspberry Pi.

Varun Dhamija

Development Consultant | Software Development | Technical Lead

2 年

Very useful and crisp information. Worth reading

回复
Kamal Singh Rautela ??

Coach & Consultant | B.E. MBA | Delhi Technological University, Arizona State University | ex-Chevron, ex-HCL Technologies

4 年

Awesome!

回复
Birender Singh

Solutions Architect (Digital, Mobile) | Digital Architecture | TOGAF?

4 年

congratulations Tarun on driving innovation and sharing very informative article with your experiences, challenges and recommendations !

回复

Good insight !! Keep it up and keep sharing ??

Sandeep Kapoor

Global Head IT at JK Technosoft

4 年

This is excellent Tarun blend of the technology

要查看或添加评论,请登录

Tarun Upadhyaya的更多文章

社区洞察

其他会员也浏览了