AI at the Edge
For a recent Meetup in Milan, I was asked to imagine what deploying AI at the Edge of the Internet would mean. I only had a few days to build an effective story, specifically on how the advertising market can benefit from extreme personalization of content.
As I always tend to approach a topic from its architecture, I started studying the subject by looking at the Deep Learning and LLM diagrams.
I won’t go into details about what DL and LLM are, but I noticed documenting myself that what most of the AI technologies have in common is the huge parallel tasks to achieve the results. So GPUs became the preferred vehicle for AI models because of their ability to handle almost identical operations simultaneously on many data samples. With the growth in the size of training data sets, the massive parallelism available in GPUs proved indispensable. You can find on Google many good results about this.
For example, talking about GPT 3, we have studies that indicate that the training on 1024 GPUs (equivalent) took 34 days, with an approximate cost of $4.6M in computing alone.?
Given the importance of the GPU, Akamai recently announced its plan to roll out cloud infrastructure and services Powered by NVIDIA, with the NVIDIA RTX 4000 Ada Generation GPU. On top of video processing, this infrastructure can be used for generative AI and machine learning.
I then started reading up on the concept of Edge AI, and I found this interesting article from Nvidia. They state that “Edge AI is the deployment of AI applications in devices throughout the physical world. It’s called Edge AI because the AI computation is done near the user at the edge of the network, close to where the data is located, rather than centrally in a cloud computing facility or private data center.”
Their article is interesting because it also contains some examples of Edge AI use cases. Of course, the way Nvidia intends the Edge is related to the deployment of their GPUs/systems in specific locations or devices, generally closer to where the users are.
In the article we also start to see the difference between training and inference: training is the process of learning and optimizing models from data, when you present a trained AI algorithm with a problem and it gives you an answer, that’s called inference.
Good, we introduced some concepts, and now we should ask ourselves: Where should I place AI computing? Should I place it in centralized systems or at the edge on specific devices? It depends.
AI in the Centralized Cloud (or DC) may be good for very heavy tasks like training AI models over GPUs, this may be expensive and may be not ideal for Inference tasks because we would face added latency (roundtrip from client to server) and minimal offload benefit.
If we discuss purely Inference AI, it can be run also at the Edge directly on specific devices, but we should pay attention to some challenges, such as inconsistent and poor HW, Over-downloading of bytes, and that in most cases we would need controls of the specific HW.
But what if we introduce the Akamai Edge, placed in many locations around the world, closer to the concentrations of users?
We can achieve:
Recently, Akamai announced plans to embed cloud computing capabilities into its massive edge network. The Akamai’s Generalized Edge Compute (Gecko) aims to embed compute with support for virtual machines into 100 cities by the end of the year.
"Akamai is delivering on the promise it made when it acquired?Linode?by quickly integrating compute into its security and delivery mix," said?Dave McCarthy,?IDC, Research?Vice President, Cloud and Edge Services. "What they're now doing with Gecko is an example of the more distributed cloud world we're heading toward, driven by demands to put compute and data closer to the edge."
How can AI Inference Workload benefit from this distributed architecture?
As the massive Akamai network is mostly based on CPU, it’s important to accelerate AI workloads using automated model sparsification technologies, available as a CPU inference engine. Akamai and Neural Magic announced a strategic partnership intended to supercharge deep learning capabilities on Akamai’s distributed computing infrastructure.
Neural Magic (Acquired by Red Hat) ’s solution enables deep learning models to run on cost-efficient CPU-based servers rather than on expensive GPU resources.? This allows the companies to deploy the capabilities across Akamai’s globally distributed computing infrastructure, offering organizations lower latency and improved performance for data-intensive AI applications.
领英推荐
My colleague Alesandro Slep?evi? tried the Neural Magic engine DeepSparse, a sparsity-aware inference runtime that delivers GPU-class performance on commodity CPUs, purely in software, anywhere. You can find here his tests.
Now, let’s try to imagine some use cases. Is it possible today to deploy AI at the Edge?
My colleague Joseph Glover recently developed a simple POC, showcasing that CPUs, with their parallel processing capabilities, offer a viable and accessible option for AI inference tasks.
You can try it yourself at this address: https://moviemind.info/
Moviemind is a personal movie recommendation application powered by a blend of Open-Source Innovation and Self-Sufficiency. You can read more here.
I also deployed the code (here is the GitHub link) to a Machine on the Akamai Edge in Milan, and put on top of it the Akamai Edge solution ION to accelerate it. You can see the results, since the cloud and edge live on the same network, you have similar latency.
Ok, the first steps are done! But then for the Meetup, since it was a #videotech Meetup, I had to imagine some ADV use cases.
I thought to list some of my characteristics, such as age, gender, the city where I live, and to give an image generative AI tool a text to create an Image ADV.
“Advertisement with 39-year-old boy drinking a Cola on a yellow Fiat Panda in the city of Turin with hot weather”
I first tried Midjourney and then also Artflow (with my character). See the results!
Imagine driving personalized ADV at the Edge, with low latency computing close to the users. Nowadays, AI Image Generation is not fast enough, but think about the possible opportunities of having this workflow in real time.
Of course, there are also other use cases, think about e-commerce applications where the customers can benefit from AI tools to improve their APP experience. Or, speaking with Alan Evans , think about LLM-powered chat Architecture distributed at the edge, to have real-time conversations with your users and customers. For example, see here what our partner Macrometa is doing.
Because we were at the #VideoTech Meetup, I also tried some tools to create AI videos, like Creatify and Synthesia. I used the previously generated images (plus others) with an avatar reading a text generated with AI.
Again, at the moment these tools take some time to render the videos, but think about the possible application of having this technology in real-time, also interacting with the users.
Finally, you can also watch this great interview with Jay Jenkins , he speaks about Gecko and its possible use cases, including AI. Many great inputs!
Conclusions
Exciting times at Akamai Technologies ! With the rollout of the new Nvidia GPU and with the support of Edge Computing (Gecko) into 100 cities by the end of the year, the Akamai customers can choose the best architecture for their AI workloads. With its partners, Akamai can support AI Inferencing at the Edge, powered by its massive network of 4,100 points of presence around the globe.
Akamai Technologies - Cloud & Edge Computing Solutions, APJ
10 个月Great post, Luca! Thank you for sharing your insights on AI on the Edge!!
Principal Technical Solutions Architect
11 个月You rock Luca! Great read??
GDE, GCP, AWS, Azure Cloud Expert, CKA/S, ex-Oracle, 38k Followers. Many X Certified in Clouds, DevOps & k8s. Hackathons Winner. Writer, Speaker, Mentor. Opinions are my own and not the views of my employer : )
11 个月thanks Luca Moglia ! Great post. I wrote something related at https://www.dhirubhai.net/posts/walterwlee_ai-at-the-edge-activity-7186246620710100992-kz10?utm_source=share&utm_medium=member_desktop - CDN/Edge AI can help overcome many device limitation, e.g. slow communication speed, limited power/cpu powers, etc...