登录查看更多内容

Hardware evaluation and Real time computer vision + ML on Edge

Dr. Farshid PirahanSiah

Senior C++ Computer Vision Research Engineer

发布日期: 2021年5月5日

+ 关注

Hardware evaluation for fast FPS video processing:

Compare Semantic Segmentation on Jetson Nano and Xavier

How to increase speed of deep learning models for computer vision application

Machine learning on edge devices requires tailored hardware/software solutions.

Several methods for increasing speed and reduce memory usages are: Ensemble Methods, Distributed machine learning and load balancing strategy, Low rank matrix factorization (LRMF), Compact convolutional filters (Video/CNN), pruning, Quantization, Knowledge distillation, Neural Networks Compression Framework (NNCF), Binarized Neural Networks (BNNs), Apache TVM, etc .

Hardware:

RISC-V, FPGA, CORE-V, eFPGA, Raspberry pi 3, Raspberry pi 4, Intel? Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, NVIDIA JETSON AGX XAVIER.

Categories of processing AI on Edge

Edge sensing with cloud AI: data may be filtered, compressed, or pre-processed at edge
Edge AI with cloud data upstream: results send cloud; AI inference at the edge for efficiency, latency, cost, or scale
Edge AI real time interactive system: everything on edge

Benefits of ML + IoT = low latency, low bandwidth, low power consumption, high availability, data privacy and security

Edge + AI accelerators
Edge + embedded AI chips

ML/AI Pipeline for Edge:

Data acquisition
Data processing
Feature extraction
Model training
Model conversion
Training & conversion
Model deployment

Pruning

Compress models by reducing the number of parameters in them is important in order to reduce memory, battery, and hardware consumption without sacrificing accuracy, deploy lightweight models on device, and guarantee privacy with private on-device computation. pruning is used to investigate the differences in learning dynamics between over-parametrized and under-parametrized networks, to study the role of lucky sparse subnetworks and initializations (“lottery tickets”) as a destructive neural architecture search technique. [1]

?? Tip: go for bigger size of network with many layers then pruning much better and faster.

Quantization

Quantization leverages 8bit integer (int8) instructions to reduce the model size and run the inference faster (reduced latency) and can be the difference between a model achieving quality of service goals or even fitting into the resources available on a mobile device. Even when resources aren’t quite so constrained it may enable you to deploy a larger and more accurate model. [2]

?? Tip: check your hardware/accelerator because some hardware does not support it.

Ensemble methods:

feature-based, compact representation, easy to reduce model size, model size can be reduced post-training

bagging is a type of ensemble method where multiple models are trained in parallel on subsampled datasets (reduces error due to variance); many models to combine output to make a single classification; every model get pictures on dataset , over fit, but combine get same accuracy, picture the noise is cancelled for each model.

boosting is a type of ensemble method where multiple models are trained in sequence to improve upon the errors of the previous model (reduces error due to bias)

Distillation Techniques

Model Knowledge distillation is a method used to reduce the size of a model without loosing too much of its predictive powers.

Binarized Neural Networks (BNNs)

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks: https://arxiv.org/abs/1603.05279.
Binarized Neural Networks :https://papers.nips.cc/paper/6573-binarized-neural-networks
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients :https://arxiv.org/abs/1606.06160

It doesn't support by GPU hardware such as Jetson Nano.

Apache TVM: An End to End Machine Learning Compiler Framework for CPUs, GPUs and accelerators

Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend.

Compact convolutional filters (Video/CNN)

designing special structural convolutional filters to save parameters. Replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy.

Distributed machine learning and load balancing strategy

Run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance

Frameworks / Libraries for Edge/IoT

Apache TVM
Intel? Distribution of OpenVINO Toolkit
CoreML
ML kit
FRITZ
MediaPipe
Docker for embedded
tinyML
NVIDIA TensorRT
TensorFlow Lite
TensorFlow.js
PyTorch Lightning
PyTorch Mobile
Code size compiler optimizations and techniques for embedded systems
Useful library for programming

ffmpeg; GStreamer celery

GPU library for python: PyCUDA, NumbaPro, PyOpenCL, CuPy

Other techniques

Low rank matrix factorization (LRMF)

Knowledge distillation

Neural Networks Compression Framework (NNCF) [3]

Pragya Singh

3 年

Thanks for sharing.This is an intresting read for beginners in edge ML/AI. It would be great if you could mentor me on getting started.

1 次回应

要查看或添加评论，请登录

Dr. Farshid PirahanSiah的更多文章

The New Developer Era: Transforming Your Career and Building Production-Ready AI Agents in 2025; Agents will replace all software

2024年12月30日

The New Developer Era: Transforming Your Career and Building Production-Ready AI Agents in 2025; Agents will replace all software

https://www.pirahansiah.
Computer Vision Meets LLM

2024年12月30日

Computer Vision Meets LLM

https://www.pirahansiah.
My Experience with NVIDIA for R&D AI, ML, LLM Engineer: Specialized in optimizing AI/ML workloads, scaling clusters, automating pipelines, and ...

2024年9月16日

My Experience with NVIDIA for R&D AI, ML, LLM Engineer: Specialized in optimizing AI/ML workloads, scaling clusters, automating pipelines, and ...

My Experience with NVIDIA GPUs for Deep Learning I’ve been working with NVIDIA GPUs for deep learning since the early…

4 条评论
Automated Trading App with LLM Decision-Making and Web3.py BNB MetaMask Locally Ollama llama3.1 python cryptocurrency

2024年9月15日

Automated Trading App with LLM Decision-Making and Web3.py BNB MetaMask Locally Ollama llama3.1 python cryptocurrency

https://www.linkedin.

1 条评论
Migrating to Web3.py v7: A Guide for Binance Smart Chain Developers

2024年9月15日

Migrating to Web3.py v7: A Guide for Binance Smart Chain Developers

As the blockchain ecosystem evolves, so do the tools we use to interact with it. Web3.
Building and Deploying a Creative Image Processing Telegram Bot

2024年8月26日

Building and Deploying a Creative Image Processing Telegram Bot

I will walk you through the process of building and deploying a creative image processing Telegram bot. This bot allows…

2 条评论
ASK MY CV: Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview

2024年8月20日

ASK MY CV: Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview

Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview This project…
Camera Calibration Geometric Analysis, Calibration Patterns, Multi camera

2024年4月26日

Camera Calibration Geometric Analysis, Calibration Patterns, Multi camera

Camera Calibration Geometric Analysis, Calibration Patterns, MATLAB, Python, C++, OpenCV, Subpixel Precision. A C++…
Introduction to SMART Goals

2024年1月21日

Introduction to SMART Goals

Setting the Stage for Success with SMART Goals Setting goals is a crucial component in achieving success across various…
Exploring the Power of ChatGPT in the World of Computer Vision and Image Processing: My Thoughts and Insights

2023年2月10日

Exploring the Power of ChatGPT in the World of Computer Vision and Image Processing: My Thoughts and Insights

Question: What are the best libraries for computer vision? ChatGPT: There are several popular libraries for computer…

See all articles

Hardware evaluation and Real time computer vision + ML on Edge

Dr. Farshid PirahanSiah

Senior C++ Computer Vision Research Engineer

Hardware evaluation for fast FPS video processing:

How to increase speed of deep learning models for computer vision application

Categories of processing AI on Edge

ML/AI Pipeline for Edge:

Pruning

Quantization

Ensemble methods:

Distillation Techniques

Binarized Neural Networks (BNNs)

Apache TVM: An End to End Machine Learning Compiler Framework for CPUs, GPUs and accelerators

Compact convolutional filters (Video/CNN)

Distributed machine learning and load balancing strategy

Frameworks / Libraries for Edge/IoT

Other techniques

Dr. Farshid PirahanSiah的更多文章

社区洞察

其他会员也浏览了

Semiconductors Powering the AI Revolution

Artificial Intelligence #63

How Processing Speed, Data Selection, and Energy Use will shape the future of the AI Chip Industry

AI Horizons: Innovations, Impact, and Inclusivity in Technology

How AI Tools Are Used in Chip Design and Semiconductors? Google AlphaChip.

The Self-Made Machine: How AI is Engineering Its Own Future

Can a machine replace a human?

How Close Are We to Human-Level Intelligence?

Is GenAI running out of chips?

AI Leap Forward: WizardLM Excels, SDXL 1.0 Transforms, Neural Magic Elevates!

Hardware evaluation for fast FPS video processing:

How to increase speed of deep learning models for computer vision application

Categories of processing AI on Edge

ML/AI Pipeline for Edge:

Pruning

Quantization

Ensemble methods:

Distillation Techniques

Binarized Neural Networks (BNNs)

Apache TVM: An End to End Machine Learning Compiler Framework for CPUs, GPUs and accelerators

Compact convolutional filters (Video/CNN)

Distributed machine learning and load balancing strategy

Frameworks / Libraries for Edge/IoT

Other techniques

Dr. Farshid PirahanSiah的更多文章

The New Developer Era: Transforming Your Career and Building Production-Ready AI Agents in 2025; Agents will replace all software

Computer Vision Meets LLM

My Experience with NVIDIA for R&D AI, ML, LLM Engineer: Specialized in optimizing AI/ML workloads, scaling clusters, automating pipelines, and ...

Automated Trading App with LLM Decision-Making and Web3.py BNB MetaMask Locally Ollama llama3.1 python cryptocurrency

Migrating to Web3.py v7: A Guide for Binance Smart Chain Developers

Building and Deploying a Creative Image Processing Telegram Bot

ASK MY CV: Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview

Camera Calibration Geometric Analysis, Calibration Patterns, Multi camera

Introduction to SMART Goals

Exploring the Power of ChatGPT in the World of Computer Vision and Image Processing: My Thoughts and Insights

社区洞察

其他会员也浏览了

Semiconductors Powering the AI Revolution

Artificial Intelligence #63

How Processing Speed, Data Selection, and Energy Use will shape the future of the AI Chip Industry

AI Horizons: Innovations, Impact, and Inclusivity in Technology

How AI Tools Are Used in Chip Design and Semiconductors? Google AlphaChip.

The Self-Made Machine: How AI is Engineering Its Own Future

Can a machine replace a human?

How Close Are We to Human-Level Intelligence?

Is GenAI running out of chips?

AI Leap Forward: WizardLM Excels, SDXL 1.0 Transforms, Neural Magic Elevates!