登录查看更多内容

Edge AI: Deploying AI/ML on Edge Devices

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

发布日期: 2024年3月3日

Introduction

AI/ML use-cases are pervasive. The enterprise use-cases can be broadly categorized based on the three core technical capabilities enabling them: Predictive Analytics, Computer Vision (CV) and Natural Language Processing (NLP). The Enterprise AI story has so far been focused on the Cloud. The general perception is that it takes a large amount of data and powerful machines, e.g., Graphical Processing Units (GPUs), to run AI applications.

Edge AI, also known as TinyML, aims to bring all the goodness of AI to the device. The idea is to bring the processing as close as possible to the devices generating the?data.

In its simplest form, the device is able to process the data locally and instantaneously, without any dependency on the Cloud. Edge AI enables Visual, Location and Analytical solutions at the edge for diverse industries, such as Healthcare, Automotive, Manufacturing, Retail and Energy. According to a report by Market and Markets, “the global Edge AI software market size is expected to grow to USD 1,835 million by 2026”.

The key Edge AI benefits are as follows:

Low Latency / Offline execution: Running AI models on the Cloud means a round-trip latency of at least few milliseconds, which can potentially go up to a few seconds depending on network connectivity. Unfortunately, this is not sufficient for real-time decision making, e.g., automated cars at high-speed, robots monitoring elderly people, or those working alongside people at factory assembly lines. While network connectivity is often taken for granted in an enterprise setting, the same cannot be said for factories in remote areas or drones flying at high altitudes over unmapped territories. Deploying the underlying models on the edge ensures that they can run offline in (near) real-time.
Privacy: Processing data locally, on the device itself, implies that we do not need to send it back to the Cloud for processing. This becomes increasingly relevant as smart devices (e.g., cameras, speakers) start getting deployed in shops, hospitals, offices, factories, etc., coupled with growing user distrust pertaining to how enterprises are storing and processing their personal data, including images, audio, video, location and shopping history. In addition, storing data in any form always raises the risk of potential hacks and cyberthefts.
Reduce Costs: Real-time processing at the edge not only enables low latency and privacy protection, but it also acts as a ‘filter’ ensuring that only relevant data gets transmitted to the Cloud for further processing?—?saving bandwidth. Less data transferred to the Cloud, also implies less storage and processing costs on the Cloud. Processing data on the Cloud can be quite expensive, esp. when it is in the order of gigabytes (link ) or petabytes (link ) per day.

Internals: Training vs. Deployment

Most of today’s ML models are supervised and applied on a prediction or classification task. Given a training (labeled) dataset, the Data Scientist has to go through a laborious process called feature extraction and the model’s accuracy depends entirely upon the Data Scientist’s ability to pick the right feature set. For simplicity, each feature can be considered a column of a dataset provided as a CSV file. The advantage of DL is that the program selects the feature set by itself without supervision, i.e. feature extraction is automated. This is achieved by training large-scale neural networks, referred to as Deep Neural Nets (DNNs) over large labeled datasets. Training a DNN occurs over multiple iterations (epochs). Each forward run is coupled with a feedback loop, where the classification errors identified at the end of a run with respect to the ground truth (training dataset) is fed back to the previous (hidden) layers to adapt their parameter weights?—?‘backpropagation’.

It is important to understand that the training and deployment processes for a DL lifecycle are completely decoupled. During training, a large amount of data is used to calculate the model parameters (coefficients, weights and biases)?—?leading to the need for more resources. Once trained, the computed parameters can be persisted to storage (file), program memory (RAM), and deployed as an API. The deployed models are monitored for drift, and retrained as necessary.

Trained ML/DL models can be deployed as APIs that completely decouple the consumption from the training process. It also allows a trained ML/DL model to be embedded in devices with limited memory and computational resources?—?enabling their execution in an offline?fashion.

Edge Devices

While ML models have traditionally been embedded in cameras, mobiles, drones, self-driven cars, etc., the growing adoption of Edge AI has led to the development of specialized devices capable of performing AI inferencing efficiently, e.g., Nvidia Jetson Nano, Google Coral, AWS DeepLens. Benchmarking results show 30 times performance gain running a Computer Vision model (MobileNet) on a generic Raspberry PI vs. specialized Nvidia Jetson. Nvidia provides the Nvidia Jetson AGX Xavier Developer Kit and Jetson AGX Orin Developer Kit provides tools and libraries for development of Edge AI applications.

Major cloud vendors have collaborated with semiconductor companies to deliver AI?chips.

Microsoft collaborated with Qualcomm to develop Vision AI DevKit . It uses Azure ML (with support for frameworks like TensorFlow, Caffe) to develop the models and Azure IoT Edge to deploy the models to the kit as containerized Azure services. The Qualcomm Neural Processing SDK helps in optimizing the models?—?further reducing latency and improving application performance efficiency.

In collaboration with Intel, Amazon developed DeepLens ?—?a wireless camera with AI inferencing capabilities. It is integrated with Amazon SageMaker , which is Amazon’s primary Data Science platform. This allows ML models to be trained on SageMaker with support for different ML/DL frameworks, e.g., Scikit-learn, TensorFlow, PyTorch and Amazon’s own MXNet, and then to be deployed on DeepLens in an integrated fashion.

Bernard Marr 3 年前

AURA Aware Lessons 1-5 November 2024 release

Scott Andersen 3 周前

Acceleration in Innovation! The Latest Breakthroughs…

Sally Eaves 3 年前

Google has also entered the field recently with their Coral range of products. The underlying hardware is Google’s Edge Tensor Processing Units (TPUs). Coral provides the full toolkit to train TensorFlow models and deploy them on different platforms using the Coral USB Accelerator. Coral’s key differentiator, which can also be considered as its main drawback, is its tight integration with Google’s cognitive ecosystem, such that its Edge TPU-powered hardware only works with Google’s ML/DL framework, TensorFlow.

Model Optimization and Compression for the Edge

Depending on the task, both ML algorithms, e.g., K-nearest neighbours (K-NN), Support Vector Machines (SVMs) and Tree-based algorithms, and Deep Neural Networks, e.g., Convolutional Neural Networks (CNNs), Recurrent Neural networks (RNNs); can be considered for deployment on edge devices.

The choice of ML algorithms often depends on the algorithm internals, e.g., how much model compression can be achieved, as well as the characteristics of the underlying hardware device and its supported platform(s).

In particular, the following factors need to be considered while designing an Edge AI solution:

Model design: The goal is to reduce the model’s inference time on the device. Deep Neural Networks (DNNs) often require storing and accessing a large number of parameters that describe the model architecture. We thus need to design DNN architectures with reduced number of parameters. SqueezeNet is a good examples of efficient DNN architecture, optimized for Computer Vision use-cases. Neural Architecture Search (NAS) can also be used to discover edge efficient architectures.
Model compression: Edge devices have limitations not only in terms of computational resources, but also memory. There are mainly two ways to perform NN compression: Lowering precision and fewer weights (pruning). By default, model parameters are float32 type variables, which lead to large model sizes and slower execution times. Post-training quantization tools, e.g., TensorFlow Lite, can be used to reduce the model parameters from float32 bits to unit8, at the expense of (slightly) lower precision. Pruning works by eliminating the network connections that are not useful to the NN, leading to reduction in both memory and computational overhead.
Device considerations: ML/DL algorithms are characterized by extensive linear algebra, matrix and vector data operations. Traditional processor architectures are not optimized for such workloads, and hence, specialized processing architectures are necessary to meet the low latency requirements of running complex ML algorithm operations. As such, factors to be considered while choosing the edge device include balancing the model architecture (accuracy, size, operation type) requirements with device programmability, throughput, power consumption and cost.

Edge AI Use-case — Body Pose Estimation

In a previous paper (Vuleti? et. al., 2021), we have provided details of developing an Edge AI App to monitor patients in their beds and signal an alarm if some of the patients fell off the bed - a big problem for residents of hospitals and old-age homes. The development included the selection of state-of-the-art object detection algorithms for face-landmarks and body pose detection, esp. RetinaFace (Deng et al., 2019) and OpenPifPaf (Kreiss et al., 2021); and porting them to the Nvidia Jetson NX platform.

In order to normalize the models, we have applied the methodology presented in the previous section, using several development phases: initial evaluation; check of the pre and post processing compatibility with our libraries; missing layers identification; missing layer implementation; inference, pre-processing, and post-processing cross-validation; and eventually the benchmarking.

The figure below provides illustrations of the algorithms execution on sample videos from a possible field of application (e.g., detecting elderly people state and position). For the purpose of the experiments, we have packaged the models into deployable applications, both with custom-developed application layers.

Fig: Body-pose and human state detection (Source: Vuleti? et. al., 2021)

References

class>Dianlei Xu, et. al. Edge Intelligence: Architectures, Challenges, and Applications. (link class>)

class>M. Terzi, et. al. Edge Intelligence: Challenges and Opportunities of Near-Sensor Machine Learning Applications. (

link class>) class>M. Merenda, et. al. Edge Machine Learning for AI-Enabled IoT Devices: A Review. (

link class>) class>D. Moelker. When to Bring AI to the Edge. (

link class>) class="italic">What Is Edge AI And Why Should Enterprises Care? (

link class>) class>B. Wilson. 5 Ways Edge AI Will Change Enterprises in 2021. (

link class>) class="italic">What is Edge AI, and Why Enterprises Should Care About It? (

link class>) class>Y. Khan. How AI at the Edge Can Generate Enterprise-Wide Savings. (

link class>) class>Vuleti?, M., Mujagi?, V., Milojevic, N., Biswas, D. Edge AI Framework for Healthcare Applications. In?4th IJCAI Workshop on AI for Ageing, Rehabilitation and Intelligent Assisted Living (ARIAL), 2021 (

link class>) class>Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., and Zafeiriou, S. RetinaFace: Single-stage Dense Face Localisation in the Wild. arXiv, abs/1905.00641, 2019. class>Kreiss, S., Bertoni, L., and Alahi, A. OpenPif-Paf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association. arXiv, abs/2103.02440, 2021.

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

8 个月

This is a short extract from our 2021 #IJCAI Workshop paper: Vuleti?, M., Mujagi?, V., Milojevic, N., Biswas, D. Edge AI Framework for Healthcare Applications. In?4th IJCAI Workshop on AI for Ageing, Rehabilitation and Intelligent Assisted Living (ARIAL), 2021, https://www.researchgate.net/publication/351915142_Edge_AI_Framework_for_Healthcare_Applications If you prefer reading the article on Medium, please refer below: https://medium.com/darwin-edge-ai/edge-ai-framework-for-rapid-prototyping-and-deployment-cabf466dddef

1 次回应

Car Media

8 个月

Exciting developments in edge AI! Can't wait to see more applications emerge. ?? Debmalya Biswas

1 次回应

Chris Brown

Business Leader Offering a Track Record of Achievement in Project Management, Marketing, And Financial.

8 个月

Exciting advancements in Edge AI technology! Can't wait to see how it transforms different industries.

1 次回应

Alex Belov

AI Business Automation & Workflows | Superior Website Creation & Maintenance | Podcast

8 个月

So exciting to see AI's evolution!

1 次回应

查看更多评论

要查看或添加评论，请登录

Debmalya Biswas的更多文章

Agentic RAGs: consolidated querying of SQL & Document repositories

2024年10月29日

Agentic RAGs: consolidated querying of SQL & Document repositories

1. Introduction Given the way technology has evolved, we have become accustomed to thinking of certain solution…

12 条评论
Unifying Data & Gen AI / LLM platforms

2024年10月16日

Unifying Data & Gen AI / LLM platforms

AI / Gen AI challenges for a Data platform As a Data and AI/ML practitioner, I have always wondered as to why we have…

9 条评论
Conversational BI with Snowflake's Cortex Analyst

2024年10月3日

Conversational BI with Snowflake's Cortex Analyst

I have previously written about Conversational BI and the challenges in realizing them. With large language models…

9 条评论
Stateful and Responsible AI?Agents

2024年8月25日

Stateful and Responsible AI?Agents

Introduction to AI Agents The discussion around ChatGPT, has now evolved into AutoGPT. While ChatGPT is primarily a…

11 条评论
Conflicting Prompts, and the challenges in building Enterprise Prompt Stores

2024年8月17日

Conflicting Prompts, and the challenges in building Enterprise Prompt Stores

Introduction Prompts today are the primary mode of interaction with large language models (LLMs). Prompts need to be…

5 条评论
LLM Personalization: User Persona based Personalization of LLM generated Responses

2024年8月11日

LLM Personalization: User Persona based Personalization of LLM generated Responses

Introduction ChatGPT, or the underlying Large Language Models (LLMs) today, are able to generate contextualized…

5 条评论
Use-case based evaluation of LLMs

2024年7月21日

Use-case based evaluation of LLMs

Introduction We are at a critical juncture in the Generative AI adoption journey, where we are have started hearing…

6 条评论
Gen AI Privacy: Privacy Risks of LLMs

2024年7月6日

Gen AI Privacy: Privacy Risks of LLMs

Machine Learning (ML) Privacy Risks Let us first consider the Privacy attack scenarios in a traditional Supervised ML…

8 条评论
Responsible LLMOps: Integrating Responsible AI practices into LLMOps

2024年6月16日

Responsible LLMOps: Integrating Responsible AI practices into LLMOps

Abstract. While we see growing adoption of both LLMOps & Responsible AI practices in Gen AI implementations, the…

6 条评论
Delta Lake, Iceberg & Hudi: A Transactional Perspective

2024年6月9日

Delta Lake, Iceberg & Hudi: A Transactional Perspective

Abstract. Transactions with their ACID guarantees used to be the backbone of Database Management Systems.

2 条评论

See all articles

Edge AI: Deploying AI/ML on Edge Devices

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

Introduction

Internals: Training vs. Deployment

Edge Devices

领英推荐

Model Optimization and Compression for the Edge

Edge AI Use-case — Body Pose Estimation

References

Debmalya Biswas的更多文章

社区洞察

其他会员也浏览了

Tech Odyssey 2024: Journey Through AI, Web, and Mobile Innovations

The Truth about Generative AI: How Transformers are Changing the Game Forever!

Introduction to AI, GenAI, and ML: An Overview of Key Concepts

UNDERSTANDING TRADITIONAL, GENERATIVE, AUGMENTED AI AND THEIR APPLICATIONS

Transforming Business with Cutting-Edge AI Solutions

"Unleashing the Power of AI: Navigating the Transformative Pathways of Artificial Intelligence"

Types of AI Techniques, Case Studies, and Their Economic Impact on Human Experience (HX)

Blog Post: Demystifying AI: Basic Principles and How It Works for Your Business

AI Synergy: Boost Productivity with GPT-4 & Midjourney.ai

Introduction

Internals: Training vs. Deployment

Edge Devices

领英推荐

Model Optimization and Compression for the Edge

Edge AI Use-case — Body Pose Estimation

References

Debmalya Biswas的更多文章

Agentic RAGs: consolidated querying of SQL & Document repositories

Unifying Data & Gen AI / LLM platforms

Conversational BI with Snowflake's Cortex Analyst

Stateful and Responsible AI?Agents

Conflicting Prompts, and the challenges in building Enterprise Prompt Stores

LLM Personalization: User Persona based Personalization of LLM generated Responses

Use-case based evaluation of LLMs

Gen AI Privacy: Privacy Risks of LLMs

Responsible LLMOps: Integrating Responsible AI practices into LLMOps

Delta Lake, Iceberg & Hudi: A Transactional Perspective

社区洞察

其他会员也浏览了

Tech Odyssey 2024: Journey Through AI, Web, and Mobile Innovations

The Truth about Generative AI: How Transformers are Changing the Game Forever!

Introduction to AI, GenAI, and ML: An Overview of Key Concepts

UNDERSTANDING TRADITIONAL, GENERATIVE, AUGMENTED AI AND THEIR APPLICATIONS

Transforming Business with Cutting-Edge AI Solutions

"Unleashing the Power of AI: Navigating the Transformative Pathways of Artificial Intelligence"

Types of AI Techniques, Case Studies, and Their Economic Impact on Human Experience (HX)

Blog Post: Demystifying AI: Basic Principles and How It Works for Your Business

AI Synergy: Boost Productivity with GPT-4 & Midjourney.ai