Embedded AI
Update 28.April.2021
Embedded AI: How to use computer vision with deep learning in IoT devices. Inference machine learning on Edge require some extra steps.
I tested several hardware such as Raspberry pi 3, Raspberry pi 4, Intel? Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, etc.
Anomaly detection, object detection, segmentation, Semantic segmentation, instance segmentation, object tracking,
Keyword:
Digital Twins, Image quality, PSNR, SSIM, distortion, Automatically adjusting brightness of image, using generic sharpening kernel to remove blurriness, fast FPS process, OCR, Evaluation hardware, ...
Introduction:
edge means local ≠ cloud
Advantage:
- No need for Internet connection
- Real-time processing
- Data could be sensitive so not require to sent to cloud
- Optimization achieve efficiency with edge AI models
Example of Deploying AI at the edge:
- handling input streams
- processing model outputs: Convert a model to an Intermediate Representation (IR); Use the IR with the Inference Engine
- lightweight MQTT architecture
Special frameworks or library for edge devices:
- NVIDIA TensorRT
- TensorFlow Lite: TensorFlow Lite on Microcontroller Gesture Recognition OpenMV/Tensorflow/ studio.edgeimpulse.com
- TensorFlow.js
- PyTorch Lightning
- PyTorch Mobile
- Intel? Distribution of OpenVINO Toolkit
- CoreML
- ML kit
- FRITZ
- MediaPipe
- Apache TVM
- TinyML: enabling ultra-low power machine learning at the edge tiny machine learning with Arduino
- Parallel programming
Moreover, think about deep learning model for your specific hardware at first stage.
OS for Edge devices:
- Ubuntu
- ROS
- Raspberry Pi OS
- real-time operating system (RTOS)
- Nasa cFS (core Flight System)
- Real-Time Executive for Multiprocessor Systems (RTEMS),
In some case you need to enhance model for inference. There are many techniques to use such as,
- Pruning
- Quantization
- Distillation Techniques
- Binarized Neural Networks (BNNs)
- Apache TVM (incubating) is a compiler stack for deep learning systems
- Distributed machine learning and load balancing strategy
- Low rank matrix factorization (LRMF)
- Compact convolutional filters (Video/CNN)
- Knowledge distillation
- Neural Networks Compression Framework (NNCF)
- Parallel programming
How
Pruning
- model pruning: reducing redundant parameters which are not sensitive to the performance. aim: remove all connections with absolute weights below a threshold.
- go for bigger size of network with many layers then pruning much better and faster
Quantization
- compresses by reducing the number of bits used to represent the weights
- quantization effectively constraints the number of different weights we can use inside our kernels
- per channel quantization for weights, which improves performance by model compression and latency reduction.
- The best way is using Google library which support most comprehensive methods.
- 32FP->INT8
Distillation Techniques
Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms
Binarized Neural Networks (BNNs)
It is not support by GPU hardware such as Jetson Nano. mostly based on CPU
Apache TVM (incubating) is a compiler stack for deep learning systems
- challenges with large scale models
- deep neural networks are: computationally expensive, memory intensive
- hindering their deployment in: devices with low memory resources, applications with strict latency requirements
- other issues: data security: tend to memorize everything including PII, bias e.g. profanity: trained on large scale public datas
- elf discovering: instead of manually configuring conversational flows, automatically discover them from your data
- self training: let your system train itself with new examples
- self managing: let your system optimize by itself knowledge distillation
Distributed machine learning and load balancing strategy
- run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance.
- dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture.
- partitioning techniques through convolution layer fusion to dynamically select the optimal partition according to the availability of computational resources and network conditions.
Low rank matrix factorization (LRMF)
- there exists latent structures in the data, by uncovering which we can obtain a compressed representation of the data
- LRMF factorizes the original matrix into lower rank matrices while preserving latent structures and addressing the issue of sparseness
Compact convolutional filters (Video/CNN)
- designing special structural convolutional filters to save parameters
- replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy
Knowledge distillation
- training a compact neural network with distilled knowledge of a large model
- distillation (knowledge transfer) from an ensemble of big networks into a much smaller network which learns directly from the cumbersome model's outputs, that is lighter to deploy
Neural Networks Compression Framework (NNCF)
if the object is large and we do not need small anchor
- in mobileNet we can remove small part of network which related to small objects.
- in YOLO reduce number of anchor.
- decrease size of image input but reduce the accuracy
Parallel programming and clean code, design pattern
- Libraries: ffmpeg, GStreamer, celery,
- GPU library for python: PyCUDA, NumbaPro, PyOpenCL, CuPy
Camera on Edge:
What you need to considerate about camera:
- camera calibration is important
- Quantum efficiency [%] (spectral response)
- Sensor size [inches or mm] and pixel size [micro meter]
- Dynamic Range [dB]
- Image noise and signal to noise ratio (SNR), PSNR, SSIM, : greater SNR yields better contrast and clarity, as well as improved low light performance
Data set:
large gap between training and validation loss => unrepresentative training (few examples, ): not provide sufficient information to learn
validation loss shows noisy movements => unrepresentative Validation (not provide sufficient information to evaluate, few examples)
validation loss that is lower than the training loss: validation dataset may be easier for the model to predict
ROS
Source: https://www.youtube.com/watch?v=0BxVPCInS3M
The recording of this course is part of the Programming for Robotics (ROS) Lecture at ETH Zurich
Lecturers: Péter Fankhauser, Dominic Jud, Martin Wermelinger
This course gives an introduction to the Robot Operating System (ROS) including many of the available tools that are commonly used in robotics. With the help of different examples, the course should provide a good starting point for students to work with robots. They learn how to create software including simulation, to interface sensors and actuators, and to integrate control algorithms.
Data set and under-fitting and over-fitting:
large gap between training and validation loss => unrepresentative training (few examples, ): not provide sufficient information to learn
validation loss shows noisy movements => unrepresentative Validation (not provide sufficient information to evaluate, few examples)
validation loss that is lower than the training loss: validation dataset may be easier for the model to predict
under-fit:
- loss remains flat, noisy values of relatively high loss,
- loss continues to decrease,
overfit:
training loss continues to decrease but validation loss start to increase after some point
How to handle under-fitting and over-fitting:
AI Edge: How to inference deep learning models on edge/IoT ; Enabling efficient high-performance ; Accelerators/Optimization on Deep LearningSmart AI IoT, Robotic, 3D SLAM, AR, VR
Embedded IoT: Devices, Solutions Architect, Security, Safety,
Learn More: https://www.tiziran.com/
#computervision #AI #objectdetection #objecttracking #ml #research #CNN #gans #convolutionalneuralnetworks #ai #vr #reinforcementlearning #mlops #aiforbusiness #science #researcher #phd #cameracalibration #opticalflow #videostablization #humanoidrobot #localization #3dSLAM #reconstruction #pointcloud #mixedreality #edgecomputing #raspberrypi #intelstick #googlecoral #jetsonnano #nvidiavgpu #tensorflowjs #pytorch #opencv #aikit #caffee #DIGITS #python #ubuntu #farshidpirahansiah #tiziran #farshid #pirahansiah #robotics #MultiCameraMultiClassMultiObjectTracking #deeplearning #machinelearning #artificialintelligence #tensorflow #robotics #3dvision #sterovision #depthmap #RCNN #machinevision #imageprocessing #patternrecognition #compiler #RISC #RNN #fullStackDeepLearning #productinnovation #patents #TensorRT #ApacheTVM #TFLite #PyTorchmobile #dockers #gRPC #RESTAPIs #GRPC #GraphQL
Reference: