Edge AI: The Network may be less important than you think
(This is an amended & edited version of a post first published on a client Deeplite's blog here and also incorporates learnings from a webinar I moderated for them, with speakers from ARM, H1 and PJC Ventures on July 13th - a recording is here)
Introduction
A recurring theme for my work on Edge Computing is "orders of magnitude". Depending on the company and individual involved, edge discussions span 10 or more orders of magnitude of power, distance and latency: milliwatts to megawatts, millimetres to 100s of km, femtoseconds to days. I've written various times on this, such as this post on latency.
Recently, I've been looking at the "small" end of the edge space - what is happening in terms of compute on end-devices or nearby gateways. In particular, I've looked into how AI-based applications can improve their inferencing efficiency, to the extent they may not need low-latency "realtime" network access, or cloud support, for tasks such as image/video recognition, or audio analytics and speech processing.
(Note: many people in the mobile/telecom & datacentre industries don't realise the difference between training & inference for AI and especially deep neural-network models - training a model needs lots of data and processing, but is not time-critical and only gets done once/a few times. Inference is the actual *use* of that model to do stuff - recognise images, interpret speech and so on).
That has implications not just for the aspirations of cloud/edge providers, but also for 5G (and fixed) network traffic overall. A sizeable proportion of expected mobile data (especially uplink) is imagined to be from cameras, sensors and other sources uploading or streaming bulk content to cloud-based AI and "big data" platforms. But if most of it gets handled locally and doesn't transit the network at all, that's a meaningful shift. (This isn't new - almost 5 years ago I wrote this article on the same broad topic).
For example, it's common for 5G discussions and events to cite self-driving cars "generating 4TB of data per hour" or similar stats, as justification for roadway coverage/capacity and edge compute. Yet if 99.99% of the data stays on the vehicle, and most quick decisions are taken by "self-sufficient" models locally, then that has huge ramifications on everything from connected vehicle revenue projections, to 5G radio spectrum needs.
This also has implications for monetising low-latency capabilities, if some of the most-demanding future use-cases don't need the network. And perhaps more importantly in the longer term, optimised on-device AI may require hugely less power and therefore be more green - a central theme of the webinar I mentioned at the start.
Speaking the same language
The semantics of Edge are partly to blame here. Words can have multiple meanings. As a result, people involved with adjacent areas of the technology industry often misunderstand each other, even when using the same terms. Each group has its own frame of reference, history and technical domain expertise.
In particular, areas of technology cross-overs and convergence are often fraught with category errors, flawed assumptions – or just poor communications. There is a significant risk that this is occurring in the area of Edge AI. At least four different groups interpret that term in very distinct ways.
For example, many professionals in the cloud and network world have no idea about what can be achieved with optimized Edge AI on devices, either currently or what is likely soon – and what that implies for their own visions of the future. The telecoms industry, in particular, appears to be at risk of missing an important "disruption from adjacency".
This article is aimed at helping these people talk to each other, better understand each others' needs and expectations – and also help them avoid poor decisions through a lack of awareness of broader tech trends.
If you asked representatives of the following industries to play "word association" with the phrase "Edge AI", they might suggest very different explanations:
Deep Neural Network (DNN) specialist
Someone involved in image detection or speech analysis might mention the trends towards "model compression" or "AI optimization", with heavy, resource-consuming or slow cloud-based inferencing shrunk down to work more efficiently on a CPU, GPU or microcontroller on a device – for instance a camera or smartphone. This is "AI at the edge" for them. It may solve multiple problems, from lower latency to reduced energy consumption (and better economics) for AI.
IoT System developer
Someone involved with building a connected vehicle, or the quality-control for a smart factory, might think about a local compute platform capable of combining feeds from multiple cameras and other sensors, perhaps linked to autonomous driving or closed-loop automation control. Their "edge AI" resides on an onboard server, or perhaps an IoT gateway unit of some sort.
Mobile network operator (MNO)
A telecom service provider building a 5G network may think of Edge AI both for internal use (to run the radio gear more efficiently, for instance) and as an external customer-facing platform exploiting low-latency connections. The "mobile edge" might be targeted at a connected road junction, video-rendering for an AR game, or a smart city's security camera grid. Here, "Edge AI" is entwined with the network itself – its core functions, "network slicing" capabilities, and maybe physically located at a cell-site or aggregation office. It is seen as a service rather than an in-built capability of the system.
Datacentre & cloud providers
For companies hosting large-scale compute facilities, AI is seen as a huge source of current and future cloud demand. However, the infrastructure providers often won't grasp the differences between training and inferencing, or indeed the finer details of their customers' application and compute needs. "Edge" may just mean a datacentre site in a tier-3 city, or perhaps a "mini datacentre" serving users in a 10-100km radius.
These separate visions and definitions of "Edge AI" may span as much as 10 orders of magnitude in terms of scale and power – from milliwatts to megawatts. So, unsurprisingly, the conversations would be very different – and each group would probably fail to recognize each other's "edge" as relevant to their goals.
These are not the only categories. Others include chip and module vendors, server suppliers, automation and integration specialists, cloud/edge platforms and federation enablers and so forth. Added to these are a broad array of additional "edge stakeholders" – from investors to government policymakers.
Why does this matter? Because AI applications ultimately fit into broader ecosystems, transformation projects, consumer and business products or even government policy and regulatory regimes. In most cases, all of these groups will need to organize themselves into a value chain, or at least depend on each other.
The developer perspective
Often, edge-AI market participants focus – understandably – on what they perceive as their unique capabilities, whether that is their preferred models, their physical premises, network/system speeds and their existing customer relationships. And internally, they are looking for new revenue opportunities and use-cases to help justify their investments, as well as gain more "customer ownership".
But the questions which don't get asked often enough are "What does the developer – and the final end-user – really value? What are their constraints? And how will that drive their decision choices, now or in the future?"
领英推荐
For instance, consider an application developer working on an AI-powered object recognition tool. At the moment, their product has a few problems to resolve. In particular, the response times are laggy, which reduces the effectiveness and market opportunity of the overall solution. Given the round-trip time of video images to and from the cloud, plus the significant processing load and inference time, they can only get one reliable response per second, and the implied cost means it's only suitable for certain high-value tasks.
That's fine for monitoring crowds and lost property in a railway station – or detecting a particular parasitic beetle on a crop-leaf – but isn't useful for spotting defects on a fast production-line conveyor or to react to a deer jumping in front of an autonomous vehicle.
They may also need to adapt the model - for instance, a security camera picks up "false positives" because it's not just shoplifters who spend a long time in one aisle - but also shelf-stackers. But they all have a trolley, or an orange uniform, so the model can be retrained to ignore them.
For the next version of their product (or a model update), they have a range of different improvement and optimization paths they could pursue:
However, latency is not the only criterion to optimize for. In this example scenario, the developer's cloud-compute costs are escalating and they are facing ever more questions from investors and customers about issues of privacy and CO2 footprint. These bring additional trade-offs to the decision process. (The actual CO2 footprint of everything is horribly complex to estimate - you need to factor in sources of power as well as demand for it. And also bear in mind that batteries need CO2 in manufacture, so ambient energy "harvesting" may be better still for local on-device compute).
Indeed, at a high level, there are numerous technical and practical constraints involved, such as:
Looking through this list – and also considering all the other AI-related tasks, from audio/speech analysis to big-data trend analysis for digital twins – there is no singular "answer" to the best approach to Edge AI. Instead, it will be heavily use-case-dependent.
Also, clearly not all Edge/cloud/wireless applications are about AI either - many may relate to legal requirements for data collection, closed-loop automation, or device-to-device communications and analysis.
The implications of on-device AI and model compression
There are numerous approaches to optimizing AI models, both for server-side compute and on-device optimization. From the previous discussion, it can be seen that if localized inferencing becomes more feasible, then it will likely expand to many use-cases – especially those that can run independently on single, standalone devices. This has possible significant benefits for AI system developers – but also less-favorable implications for cloud and low-latency network providers.
Consider something intensely private, such as a bedside audio analyzer that detects sleep-apnoea, excessive snoring and other breathing disorders. The market for such a product could expand considerably if it came with a guarantee that personal data stayed on-device rather than being analyzed on the cloud. The model could be trained on the cloud, but inferencing could be performed at the Edge. If appropriate, it could communicate results with medical professionals and upload raw data if the user then permitted it later, but local processing would be a good selling point initially.
Yet when I regularly speak to representatives of the datacentre and telecoms worlds, especially in connection with new network types such as 5G, there is very little awareness or understanding of the role of on-device compute or AI – or how rapidly it is evolving, with improvements in processor hardware or neural network optimization.
Even in more camera-centric telecoms sectors such as videoconferencing, there seems to be little awareness of a shift back from the cloud to edge (or exactly where that Edge is). There has been some recent awareness of the conflicts between end-to-end encryption and AI-driven tasks such as background blurring and live audio-captioning – but that is just one of the trade-offs that might be shifting.
Conclusions
The shift to Edge AI has huge possible benefits for developers and IoT providers. But it may have some negatives for 5G, edge-cloud and other connectivity-oriented specialists, at least for some of their target use-cases. I think we'll see distinctions between:
The medium-term issues that seem to be underestimated are around energy budgets and privacy. If model compression and on-device Edge AI can prove not just "greener" in terms of implied CO2 footprint, but also reduce the invasiveness of mass data-collection in the cloud, then it may be embraced rapidly by many end-user groups. It may also catch the attention of policymakers and regulators, who currently have a very telecom/cloud-centric view of edge computing.
Despite this shift, it is important not to exaggerate the impact on the wider cloud and network market. This changes the calculus for some use-cases (especially real-time analysis of image, video and similar data flows) – but it does not invalidate many of the broader assumptions about future data traffic and value of high-performance networks, either wireless or wired.
But again, there's a "semantics" issue to resolve here. Often, at the core of poor assumptions is a cause of poor communications.
When all participants in the market understand each other's language and technology trajectories, we should hopefully see fewer poor assumptions and less unrealistic hype. There are huge advances occurring across the board – from semiconductors to DNN optimization to network performance. But each alone is not an all-purpose hammer – they are tools in a developers' toolkit.
#edgeAI #edgecomputing #cloud #5G #neuralnetworks #machinevision #deeplearning #IoT #imagerecognition #voiceanalytics #video #camera #AI
________________________________________________________________________
Interested in the topic and want to learn more? I specialise in this type of cross-silo, big-picture view of technology trends, especially where they intersect with wireless connectivity in some fashion. Please get in touch with me, either for internal advisory / brainstorming work, or external communications such as events, webinars and publications. (The sponsor of the original blog and webinar is Deeplite AI - drop 'em a line about Edge AI in particular, and please mention I sent you)
A really good article! It's kind of amusing (but sad) these misunderstandings. Just want to make two points: 1) Normal solution providers wants to be dependent os as little as possible. Being dependent on a "hidden" network is just a big risk. That points towards placing solutions in devices rather than networks 2) BUT if the devices are battery powered, the PUE is around 6-7. It comes from inefficient charging due to priority on energy density and charging time. This means that you can consume 3-5 times more energy running the inference in a more efficient DC and still be on par (not counting the energy consumed sending the data)
This is a brilliant article Dean. These walled gardens exist everywhere and always lead to a massive waste of money, time and focus. When working for 3GPP/WiFi companies, I noticed high walls between RAN, core and OSS/BSS. Within each area, say core, you find more walls with people who have spent 20-30 years working only with SS7, PCRF or AAA (but they can't understand a traced call flow and have no clue what impacts the user experience e2e). Was also working with integration & verification of RAN features for a while. None of the testers were using the customer OSS tools. Everyone used a CLI tool created by a RAN guy. Just one of many examples. Then, when working with IoT, you realize the exact same problem exists in cities, buildings, companies, hospitals and enterprises. Walled gardens. A lot of people are insanely good at narrow tasks. Very few people are broad and understand the full picture. I believe this is a main reason to why there's so much bad IT out there. I often recommend younger students to aim for the training/education part of companies when looking for jobs. That's probably the best place to start if you want to build deep and broad know-how.
I help you make better decisions, saving time & money. DM for scheduling a free 1h first session using Wardley mapping.
3 年Great read Dean Bubley, and good work in building bridges between domains. You mention AR there briefly. Still some years down the road for sure... But wouldn't many more advanced AR applications (e.g. gaming) be multi-party, multi-device, bidirectional, low-latency in a way that isn't suitable for on - device execution only, not for centralized cloud execution? Not only an AI use case (though could be a part). Not really a today use case except niche uses. But a potentially very big one eventually?
???? ?? Ukrainian Soldier, sergeant?. Fractional CTO, Engineering Director, Solutions Architect, University lecturer at peace time. Edge Computing, IoT, SaaS, Cloud, Startups speciality.??
3 年Power consumption and large computing power at the edge can be solved by the ARM architecture, they are so much energy efficient. And they could have CUDA cores that translates to the insane AI performance at the edge. But I see the largest problem in the potential customers mindset and legacy applications. I think, the future of edge applications is on conteinerized or serverless workloads which runs on energy efficient hardware architecture(means ARM or maybe soon we will see competitive RISC-V-based products). The customers had to rearchitect or build from scratch edge-native solutions. This is largest challenge that I see today for the wider adoption.
Ex-adult
3 年certainly improving the communicability of concepts between those with IT backgrounds, and those from telco, is going to be important.?