登录查看更多内容

Embedded AI

Dr. Farshid PirahanSiah

Senior C++ Computer Vision Research Engineer

发布日期: 2021年4月27日

+ 关注

Update 28.April.2021

Embedded AI: How to use computer vision with deep learning in IoT devices. Inference machine learning on Edge require some extra steps.

I tested several hardware such as Raspberry pi 3, Raspberry pi 4, Intel? Neural Compute Stick 2, OpenCV AI Kit, Google Coral, NVIDIA Jetson Nano, etc.

Anomaly detection, object detection, segmentation, Semantic segmentation, instance segmentation, object tracking,

Keyword:

Digital Twins, Image quality, PSNR, SSIM, distortion, Automatically adjusting brightness of image, using generic sharpening kernel to remove blurriness, fast FPS process, OCR, Evaluation hardware, ...

Introduction:

edge means local ≠ cloud

Advantage:

No need for Internet connection
Real-time processing
Data could be sensitive so not require to sent to cloud
Optimization achieve efficiency with edge AI models

Example of Deploying AI at the edge:

handling input streams
processing model outputs: Convert a model to an Intermediate Representation (IR); Use the IR with the Inference Engine
lightweight MQTT architecture

Special frameworks or library for edge devices:

NVIDIA TensorRT
TensorFlow Lite: TensorFlow Lite on Microcontroller Gesture Recognition OpenMV/Tensorflow/ studio.edgeimpulse.com
TensorFlow.js
PyTorch Lightning
PyTorch Mobile
Intel? Distribution of OpenVINO Toolkit
CoreML
ML kit
FRITZ
MediaPipe
Apache TVM
TinyML: enabling ultra-low power machine learning at the edge tiny machine learning with Arduino
Parallel programming

Moreover, think about deep learning model for your specific hardware at first stage.

OS for Edge devices:

Ubuntu
ROS
Raspberry Pi OS
real-time operating system (RTOS)
Nasa cFS (core Flight System)
Real-Time Executive for Multiprocessor Systems (RTEMS),

In some case you need to enhance model for inference. There are many techniques to use such as,

Pruning
Quantization
Distillation Techniques
Binarized Neural Networks (BNNs)
Apache TVM (incubating) is a compiler stack for deep learning systems
Distributed machine learning and load balancing strategy
Low rank matrix factorization (LRMF)
Compact convolutional filters (Video/CNN)
Knowledge distillation
Neural Networks Compression Framework (NNCF)
Parallel programming

How

Pruning

model pruning: reducing redundant parameters which are not sensitive to the performance. aim: remove all connections with absolute weights below a threshold.
go for bigger size of network with many layers then pruning much better and faster

Quantization

compresses by reducing the number of bits used to represent the weights
quantization effectively constraints the number of different weights we can use inside our kernels
per channel quantization for weights, which improves performance by model compression and latency reduction.
The best way is using Google library which support most comprehensive methods.
32FP->INT8

Distillation Techniques

Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms

Binarized Neural Networks (BNNs)

It is not support by GPU hardware such as Jetson Nano. mostly based on CPU

Apache TVM (incubating) is a compiler stack for deep learning systems

challenges with large scale models
deep neural networks are: computationally expensive, memory intensive
hindering their deployment in: devices with low memory resources, applications with strict latency requirements
other issues: data security: tend to memorize everything including PII, bias e.g. profanity: trained on large scale public datas
elf discovering: instead of manually configuring conversational flows, automatically discover them from your data
self training: let your system train itself with new examples
self managing: let your system optimize by itself knowledge distillation

Distributed machine learning and load balancing strategy

run models which use all processing power like CPU,GPU,DSP,AI chip together to enhance inference performance.
dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture.
partitioning techniques through convolution layer fusion to dynamically select the optimal partition according to the availability of computational resources and network conditions.

Low rank matrix factorization (LRMF)

there exists latent structures in the data, by uncovering which we can obtain a compressed representation of the data
LRMF factorizes the original matrix into lower rank matrices while preserving latent structures and addressing the issue of sparseness

Compact convolutional filters (Video/CNN)

designing special structural convolutional filters to save parameters
replace over parametric filters with compact filters to achieve overall speedup while maintaining comparable accuracy

Knowledge distillation

training a compact neural network with distilled knowledge of a large model
distillation (knowledge transfer) from an ensemble of big networks into a much smaller network which learns directly from the cumbersome model's outputs, that is lighter to deploy

Neural Networks Compression Framework (NNCF)

if the object is large and we do not need small anchor

in mobileNet we can remove small part of network which related to small objects.
in YOLO reduce number of anchor.
decrease size of image input but reduce the accuracy

Parallel programming and clean code, design pattern

Libraries: ffmpeg, GStreamer, celery,
GPU library for python: PyCUDA, NumbaPro, PyOpenCL, CuPy

Camera on Edge:

What you need to considerate about camera:

camera calibration is important
Quantum efficiency [%] (spectral response)
Sensor size [inches or mm] and pixel size [micro meter]
Dynamic Range [dB]
Image noise and signal to noise ratio (SNR), PSNR, SSIM, : greater SNR yields better contrast and clarity, as well as improved low light performance

Data set:

large gap between training and validation loss => unrepresentative training (few examples, ): not provide sufficient information to learn

validation loss shows noisy movements => unrepresentative Validation (not provide sufficient information to evaluate, few examples)

validation loss that is lower than the training loss: validation dataset may be easier for the model to predict

ROS

Source: https://www.youtube.com/watch?v=0BxVPCInS3M

The recording of this course is part of the Programming for Robotics (ROS) Lecture at ETH Zurich

Lecturers: Péter Fankhauser, Dominic Jud, Martin Wermelinger

This course gives an introduction to the Robot Operating System (ROS) including many of the available tools that are commonly used in robotics. With the help of different examples, the course should provide a good starting point for students to work with robots. They learn how to create software including simulation, to interface sensors and actuators, and to integrate control algorithms.

Data set and under-fitting and over-fitting:

large gap between training and validation loss => unrepresentative training (few examples, ): not provide sufficient information to learn

validation loss shows noisy movements => unrepresentative Validation (not provide sufficient information to evaluate, few examples)

validation loss that is lower than the training loss: validation dataset may be easier for the model to predict

under-fit:

loss remains flat, noisy values of relatively high loss,
loss continues to decrease,

overfit:

training loss continues to decrease but validation loss start to increase after some point

How to handle under-fitting and over-fitting:

AI Edge: How to inference deep learning models on edge/IoT ; Enabling efficient high-performance ; Accelerators/Optimization on Deep LearningSmart AI IoT, Robotic, 3D SLAM, AR, VR

Embedded IoT: Devices, Solutions Architect, Security, Safety,

Learn More: https://www.tiziran.com/

#computervision #AI #objectdetection #objecttracking #ml #research #CNN #gans #convolutionalneuralnetworks #ai #vr #reinforcementlearning #mlops #aiforbusiness #science #researcher #phd #cameracalibration #opticalflow #videostablization #humanoidrobot #localization #3dSLAM #reconstruction #pointcloud #mixedreality #edgecomputing #raspberrypi #intelstick #googlecoral #jetsonnano #nvidiavgpu #tensorflowjs #pytorch #opencv #aikit #caffee #DIGITS #python #ubuntu #farshidpirahansiah #tiziran #farshid #pirahansiah #robotics #MultiCameraMultiClassMultiObjectTracking #deeplearning #machinelearning #artificialintelligence #tensorflow #robotics #3dvision #sterovision #depthmap #RCNN #machinevision #imageprocessing #patternrecognition #compiler #RISC #RNN #fullStackDeepLearning #productinnovation #patents #TensorRT #ApacheTVM #TFLite #PyTorchmobile #dockers #gRPC #RESTAPIs #GRPC #GraphQL

Reference:

要查看或添加评论，请登录

Dr. Farshid PirahanSiah的更多文章

The New Developer Era: Transforming Your Career and Building Production-Ready AI Agents in 2025; Agents will replace all software

2024年12月30日

The New Developer Era: Transforming Your Career and Building Production-Ready AI Agents in 2025; Agents will replace all software

https://www.pirahansiah.
Computer Vision Meets LLM

2024年12月30日

Computer Vision Meets LLM

https://www.pirahansiah.
My Experience with NVIDIA for R&D AI, ML, LLM Engineer: Specialized in optimizing AI/ML workloads, scaling clusters, automating pipelines, and ...

2024年9月16日

My Experience with NVIDIA for R&D AI, ML, LLM Engineer: Specialized in optimizing AI/ML workloads, scaling clusters, automating pipelines, and ...

My Experience with NVIDIA GPUs for Deep Learning I’ve been working with NVIDIA GPUs for deep learning since the early…

4 条评论
Automated Trading App with LLM Decision-Making and Web3.py BNB MetaMask Locally Ollama llama3.1 python cryptocurrency

2024年9月15日

Automated Trading App with LLM Decision-Making and Web3.py BNB MetaMask Locally Ollama llama3.1 python cryptocurrency

https://www.linkedin.

1 条评论
Migrating to Web3.py v7: A Guide for Binance Smart Chain Developers

2024年9月15日

Migrating to Web3.py v7: A Guide for Binance Smart Chain Developers

As the blockchain ecosystem evolves, so do the tools we use to interact with it. Web3.
Building and Deploying a Creative Image Processing Telegram Bot

2024年8月26日

Building and Deploying a Creative Image Processing Telegram Bot

I will walk you through the process of building and deploying a creative image processing Telegram bot. This bot allows…

2 条评论
ASK MY CV: Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview

2024年8月20日

ASK MY CV: Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview

Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview This project…
Camera Calibration Geometric Analysis, Calibration Patterns, Multi camera

2024年4月26日

Camera Calibration Geometric Analysis, Calibration Patterns, Multi camera

Camera Calibration Geometric Analysis, Calibration Patterns, MATLAB, Python, C++, OpenCV, Subpixel Precision. A C++…
Introduction to SMART Goals

2024年1月21日

Introduction to SMART Goals

Setting the Stage for Success with SMART Goals Setting goals is a crucial component in achieving success across various…
Exploring the Power of ChatGPT in the World of Computer Vision and Image Processing: My Thoughts and Insights

2023年2月10日

Exploring the Power of ChatGPT in the World of Computer Vision and Image Processing: My Thoughts and Insights

Question: What are the best libraries for computer vision? ChatGPT: There are several popular libraries for computer…

See all articles

Embedded AI

Dr. Farshid PirahanSiah

Senior C++ Computer Vision Research Engineer

Embedded AI: How to use computer vision with deep learning in IoT devices. Inference machine learning on Edge require some extra steps.

Introduction:

Example of Deploying AI at the edge:

Special frameworks or library for edge devices:

OS for Edge devices:

In some case you need to enhance model for inference. There are many techniques to use such as,

How

Data set:

ROS

Data set and under-fitting and over-fitting:

How to handle under-fitting and over-fitting:

Dr. Farshid PirahanSiah的更多文章

社区洞察

其他会员也浏览了

Sora-ing to New Heights in AI

Unleashing the Power: The Transformative Role of AI Accelerator Architectures in Semiconductor Innovation

(#12) "The Top 5 Tech Skills You’ll Need in 2025 to Stay Ahead ??"

15 Best GPUs for Deep Learning for Your Next Project

How Does GPU Technology Help In Machine Learning?

Best GPU(s) for Deep Learning in 2021

Which Processor Does What? CPU, GPU, DPU, TPU, LPU and NPUs...

NPU: The Brain Behind the AI PC Showdown!

Unlocking AI Efficiency: How HP's New AI Notebooks and Local NPUs Supercharge Microsoft Copilot

Unlocking AI/ML Inference: An Overview of Key Players and Their Roles

Embedded AI: How to use computer vision with deep learning in IoT devices. Inference machine learning on Edge require some extra steps.

Introduction:

Example of Deploying AI at the edge:

Special frameworks or library for edge devices:

OS for Edge devices:

In some case you need to enhance model for inference. There are many techniques to use such as,

How

Data set:

ROS

Data set and under-fitting and over-fitting:

How to handle under-fitting and over-fitting:

Dr. Farshid PirahanSiah的更多文章

The New Developer Era: Transforming Your Career and Building Production-Ready AI Agents in 2025; Agents will replace all software

Computer Vision Meets LLM

My Experience with NVIDIA for R&D AI, ML, LLM Engineer: Specialized in optimizing AI/ML workloads, scaling clusters, automating pipelines, and ...

Automated Trading App with LLM Decision-Making and Web3.py BNB MetaMask Locally Ollama llama3.1 python cryptocurrency

Migrating to Web3.py v7: A Guide for Binance Smart Chain Developers

Building and Deploying a Creative Image Processing Telegram Bot

ASK MY CV: Creating a Powerful AI-Driven Telegram Bot to Answer CV Queries: A Comprehensive Guide Project Overview

Camera Calibration Geometric Analysis, Calibration Patterns, Multi camera

Introduction to SMART Goals

Exploring the Power of ChatGPT in the World of Computer Vision and Image Processing: My Thoughts and Insights

社区洞察

其他会员也浏览了

Sora-ing to New Heights in AI

Unleashing the Power: The Transformative Role of AI Accelerator Architectures in Semiconductor Innovation

(#12) "The Top 5 Tech Skills You’ll Need in 2025 to Stay Ahead ??"

15 Best GPUs for Deep Learning for Your Next Project

How Does GPU Technology Help In Machine Learning?

Best GPU(s) for Deep Learning in 2021

Which Processor Does What? CPU, GPU, DPU, TPU, LPU and NPUs...

NPU: The Brain Behind the AI PC Showdown!

Unlocking AI Efficiency: How HP's New AI Notebooks and Local NPUs Supercharge Microsoft Copilot

Unlocking AI/ML Inference: An Overview of Key Players and Their Roles