Hardware evaluation and Real time computer vision + ML on Edge

Hardware evaluation and Real time computer vision + ML on Edge

Hardware evaluation for fast FPS video processing:

Compare Semantic Segmentation on Jetson Nano and Xavier

No alt text provided for this image
No alt text provided for this image



How to increase speed of deep learning models for computer vision application

Machine learning on edge devices requires tailored hardware/software solutions.

Several methods for increasing speed and reduce memory usages are: Ensemble Methods, Distributed machine learning and load balancing strategy, Low rank matrix factorization (LRMF), Compact convolutional filters (Video/CNN), pruning, Quantization, Knowledge distillation, Neural Networks Compression Framework (NNCF), Binarized Neural Networks (BNNs), Apache TVM, etc .

Hardware:

RISC-V, FPGA, CORE-V, eFPGA, Raspberry pi 3, Raspberry pi 4, Intel? Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, NVIDIA JETSON AGX XAVIER.

No alt text provided for this image

Categories of processing AI on Edge

  1. Edge sensing with cloud AI: data may be filtered, compressed, or pre-processed at edge
  2. Edge AI with cloud data upstream: results send cloud; AI inference at the edge for efficiency, latency, cost, or scale
  3. Edge AI real time interactive system: everything on edge

Benefits of ML + IoT = low latency, low bandwidth, low power consumption, high availability, data privacy and security

  • Edge + AI accelerators
  • Edge + embedded AI chips

ML/AI Pipeline for Edge:

  • Data acquisition
  • Data processing
  • Feature extraction
  • Model training
  • Model conversion
  • Training & conversion
  • Model deployment

Pruning

Compress models by reducing the number of parameters in them is important in order to reduce memory, battery, and hardware consumption without sacrificing accuracy, deploy lightweight models on device, and guarantee privacy with private on-device computation. pruning is used to investigate the differences in learning dynamics between over-parametrized and under-parametrized networks, to study the role of lucky sparse subnetworks and initializations (“lottery tickets”) as a destructive neural architecture search technique. [1]

?? Tip: go for bigger size of network with many layers then pruning much better and faster.

Quantization

Quantization leverages 8bit integer (int8) instructions to reduce the model size and run the inference faster (reduced latency) and can be the difference between a model achieving quality of service goals or even fitting into the resources available on a mobile device. Even when resources aren’t quite so constrained it may enable you to deploy a larger and more accurate model. [2]

?? Tip: check your hardware/accelerator because some hardware does not support it.

Ensemble methods: 

feature-based, compact representation, easy to reduce model size, model size can be reduced post-training

bagging is a type of ensemble method where multiple models are trained in parallel on subsampled datasets (reduces error due to variance); many models to combine output to make a single classification; every model get pictures on dataset , over fit, but combine get same accuracy, picture the noise is cancelled for each model. 

boosting is a type of ensemble method where multiple models are trained in sequence to improve upon the errors of the previous model (reduces error due to bias)

Distillation Techniques

Model Knowledge distillation is a method used to reduce the size of a model without loosing too much of its predictive powers.

Binarized Neural Networks (BNNs)

  • XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks: https://arxiv.org/abs/1603.05279.
  • Binarized Neural Networks :https://papers.nips.cc/paper/6573-binarized-neural-networks
  • DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients :https://arxiv.org/abs/1606.06160

It doesn't support by GPU hardware such as Jetson Nano.

Apache TVM: An End to End Machine Learning Compiler Framework for CPUs, GPUs and accelerators

Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend.

Compact convolutional filters (Video/CNN)

designing special structural convolutional filters to save parameters. Replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy.

Distributed machine learning and load balancing strategy

Run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance

Frameworks / Libraries for Edge/IoT

  • Apache TVM
  • Intel? Distribution of OpenVINO Toolkit
  • CoreML
  • ML kit
  • FRITZ
  • MediaPipe
  • Docker for embedded
  • tinyML
  • NVIDIA TensorRT
  • TensorFlow Lite
  • TensorFlow.js
  • PyTorch Lightning
  • PyTorch Mobile
  • Code size compiler optimizations and techniques for embedded systems
  • Useful library for programming

ffmpeg; GStreamer celery

GPU library for python: PyCUDA, NumbaPro, PyOpenCL, CuPy

Other techniques

Low rank matrix factorization (LRMF)

Knowledge distillation

Neural Networks Compression Framework (NNCF) [3]



No alt text provided for this image








Pragya Singh

PHD Candidate | Interested in AI for Mental Health | Wearables | Digital Health | HCAI | Ubiquitous Computing | Data-centric AI

3 年

Thanks for sharing.This is an intresting read for beginners in edge ML/AI. It would be great if you could mentor me on getting started.

要查看或添加评论,请登录

Dr. Farshid PirahanSiah的更多文章

社区洞察

其他会员也浏览了