Working at the Frontier of Machine Learning and Computer Vision
Ahmed Alghadani
Co-founder & CEO @byanat | Building the technology that will enable better infrastructure in telecom, energy, and utilities. From cell towers to power grids.
In this article, I would like to share an exceptional experience from my work as a research assistant at the Embedded and Interconnected Vision Systems lab at Sultan Qaboos University from June 2020 to December 2020. There I gained priceless experience in machine learning and computer vision. I gained this experience from working on a lot of real-life projects. Those projects ranged from Knowledge Transfer Partnerships (KTPs), governmental projects, to private consultancy and commercial tenders.
Overview
The projects encompassed working on detection and area estimation of oil spills from drone and satellite images, quality control and inspection on production lines for smart factories using industrial cameras, and identifying possible COVID-19 cases from a combination of thermal analysis and symptoms analysis such as fever, coughing, and fatigue. While working on these projects I have learnt a lot about being a team player and strategically plan towards meeting tight deadlines under the COVID-19 pandemic. Below are some of the images from the oil spill project, where a simulated image is processed and extracted.
In terms of technical knowledge, this experience was a treasure, starting from camera calibration, fixing radial and tangential distortions using methods such as Zhang’s and Bouguet’s method, developing GUI and visualization tools in python, translating coordinates from RGB to the thermal image of the same scene (which is a nice challenging research area for a PhD as it is with almost every other challenge). Adding to this knowledge, I have experimented with a lot of image processing techniques such as morphological transformations, adaptive mean thresholding, and adaptive gaussian thresholding.
In the image above we see parts of the thermal camera calibration process for COVID-19 project.
Exponential Learning Curve
The most challenging and rewarding part was the wide range of Artificial Neural Networks (ANNs) I have worked with, this included YOLOv4 for object and action recognition, ResNet 3D and Two-Stream Inflated 3D (I3D) CNN. This was the first time I knew about 3D CNNs and the ability to recognize Spatio-temporal features (Amazing!). Adding to the amazing, Two-Stream I3D CNN takes two inputs, a stream of RGB images and their equivalent optical flow. This was also the first time I deal with optical flow; optical flow is acquired by differentiating each RGB frame from the one after to extract the pixels where an action happened. That’s a lot of math, and it was the only code I had in C++, the rest of the machine learning part was all Python!
Below is an example of inferencing I3D CNN for COVID-19 RGB symptoms
COVID-19 symptoms are treated as a Human Activity Recognition (HAR) problem, which is a time series problem and many researchers rely on analysing sensor data such as accelerometers. Be that as it may, attaching a sensor such as an accelerometer to a large group of people is not feasible for large scale HAR. The alternative, in this case, is visual analysis. The human activity requires time to be executed; thus, adding the temporal dimension in Spatio-temporal three-dimension (3D) kernels. Guess what? 3D CNNs proved to be a challenge and demanding. You see the image right above? that is a scene in RGB, and its optical flow across both the x-axis and the y-axis, beautiful isn't it? it only highlights the pixels where activity is present. Math! Seriously a lot of it in the optical flow only.
This image above really simplifies the difference between a 2D and 3D Kernel for convoluting 2D and 3D CNNs. The complexity increase is exponential!
It Wasn't a Smooth Road!
You can see the calibration board on the laptop screen and the drone in the background, you might need to squeeze your eyes to see the drone, it blends in well.
Died for science!
When you need your tools the most :) if I remember this correctly, this happened when I messed up something while installing Ubuntu in dual mode with Windows 10. Alas, my laptop died while training YOLOv4 for around 40+ hours continuously when the GPU got burned. RIP 2013 - 2020, was a beast of a laptop.
Comes a new workstation to the rescue
My personal laptop's death marked the transition to a desktop PC, which a nice change. This machine carried almost all of the experiments mentioned above. Based on the diverse selection of ANNs as mentioned above, it is important to select an eco-system that can house the training and inferencing of these different architectures and the corresponding frameworks and libraries needed. This was housed in a Unix based system running Linux variant Ubuntu with hardware and software eco-system as detailed in the table below. This configuration was selected based on the recommendations in the compatibility matrices written by Google TensorFlow developers for GPU configuration, and Nvidia developers for CUDA framework and its toolkits.
Reflection
Being a research assistant is a one-of-a-kind experience, you are in the frontier, pioneering R&D. You get the chance to experiment, figure out what works and what doesn’t, document your results, and network with other researchers working on the same problem at international conferences. The best part? Your work gets to become a solution for someone else’s problem!
Sr. Lecturer at College of Engineering
3 年Nice to read your article... all the best for new system and new projects!
Supply Chain - In-country Value Analyst
3 年Thanks for sharing knowledge
Dean of College of Advanced Technology at National University of Science and Technology, Oman. A seasoned leader in the higher education industry, contributing to a legacy of academic excellence.
3 年Enjoyed reading your article (experience)! Well narrated and sounds exciting! Congratulations and best wishes!
GPU Rendering Intern @ Samsung ACL | Seeking Software Engineering Jobs
3 年Great work!
Co-founder & Chief AI Officer @Byanat | Researcher in AI & Data Analytics, Parallel Programming, HPC, GPU, Machine Health Monitoring
3 年Very interesting, I think we can work on some applications using 3D CNNs and high-spec HPC. What kind of preprocessing have you implemented with 3D CNNs?