Interesting Developments In Edge Hypervisors
After building Edge Computing ecosystems at Intel and Arm, I have recently made the switch to working on Edge Computing at NVIDIA. Several people have asked me to share my perspective and learnings, so I am starting this informal, personal series on the topic. All opinions shared here are my own personal opinions and not those of my employer.
Objective
In this article, I will share two reasons why some experts in the industry are investing in hypervisor technology as well as two interesting open source edge hypervisor solutions to be aware of. For edge definition nitpickers (you know who you are), I am going to be referring to the "Device Edge" here. There are many other definition for "Edge," if you are curious, read this white paper.
The Hovercraft Analogy
For those of you who are unfamiliar, a hypervisor is kind of like a hovercraft that your programs can sit inside. Like hovercrafts, hypervisors can provide protective cushions which allow your applications to smoothly transition from one device to another, shielding the occupants from the rugged nature of the terrain below. With hypervisors, the bumps in the terrain (differences between hardware devices), are minimized and mobility of the application is increased.
Benefits of Hypervisors
Benefits of hypervisors include security, portability and reduced need to perform cumbersome customization to run on specific hardware. Hypervisors also allow a device to concurrently run multiple, completely different, operating systems. Hypervisors also can help partition applications from one another for security and reliability purposes. You can read more about hypervisors here. They frequently are compared to, used together with or even compete with containers for similar use cases, though they historically require more processing overhead to run.
Two Reasons Why Some (Very Smart) Folks Are Choosing Hypervisors For The Edge
A core challenge in Edge Computing is the extreme diversity in hardware that applications are expected to run on. This, in turn, creates challenges in producing secure, maintainable, scalable applications capable of running across all possible targets.
Unlike their heavier datacenter-based predecessors, light-weight hypervisors offer both the benefits of traditional hypervisors while also respecting the limited resources found on the device edge. Here are two reasons why some in the industry are taking a careful look at edge hypervisors.
Reason 1: Avoiding The Complexity And Overhead of Kubernetes
One potential reason for taking a hypervisor-based approach at the edge is that there may be downsides in pursuing Kubernetes for smaller clusters. These include the difficulty in building and managing a team who can properly setup and scale a real-world Kubernetes application due to the overhead and complexity of Kubernetes itself. In some cases, such as in running a cluster of 4-5 nodes, it might be desirable to use more streamlined approaches involving a controller and light-weight hypervisors. This is the approach taken by EVE, mentioned in more detail below.
Reason 2: Ease Of Modernizing Brown-Field Industrial IT
Another pressing reason for choosing edge hypervisors is that "brown-field" installations of existing edge hardware are extremely expensive to upgrade to follow modern IT "best practices." Hypervisors provide a path forward that does not involve rewriting old systems from scratch as the code running on older machines can frequently be shifted into a hypervisor and neatly managed and secured from there (a process referred to as "Workload Consolidation.")
Let's take a look at two interesting examples of edge hypervisors to understand further.
Hypervisor #1: Project ACRN
The first edge hypervisor we will look at is called ACRN, which is a project hosted by the Linux Foundation. ACRN has a well documented architecture and offers a wide range of capabilities and configurations depending on the situation or desired outcome.
ACRN seeks to support industrial use cases by offering a specific partitioning between high-reliability processes and those which do not need to receive preferential security and processing priority. ACRN accomplishes this separation by specifying a regime for sandboxing different hypervisor instances running on the device as shown above. I recommend keeping an eye on ACRN as it seems to have significant industry support. ACRN supported platforms currently tend to be strongly x86-based.
Hypervisor #2: EVE (part of LFEdge)
Also a project hosted on the Linux Foundation, EVE differs from ACRN in that it belongs to the LFEdge project cluster. Unlike ACRN, EVE also tends to be more agnostic about supported devices and architectures. Following the instructions hosted on the EVE Github page, I was able to build and run it on a Raspberry Pi 4 within the space of ten minutes, for example.
In terms of design philosophy, EVE is positioning itself as the "Android of edge devices." You can learn more about EVE by watching this recent webinar featuring the Co-Founder of Zededa, Roman Shaposhnik. EVE makes use of a "Controller" structure which provisions and manages edge nodes to simplify the overhead of operating an edge cluster.
Wrapping Up
Expect to see more happening in the hypervisor space as the technology continues to evolve. Follow me to stay up to date with the latest developments in Edge Computing.
The hypervisor that will win in the end will be the one that offers confidential compute to allow third parties to embed “agents” at the edge in full confidence. There should be a strong resource consumption contract between hypervisor and payloads. The entity that controls the hypervisor should not have the privilege to sneak it’s nose into the payloads… Intel Arm IBM processors have the technology but as far as I understand only Arm has “open sourced” a supply chain process that really enable to build trust: trust by design rather than trust by obfuscation.
Founder and C.E.O at NVME L.L.C-FZ
3 年Excellent points mentioned. Very nice.
CEO and co-founder at NodeWeaver
3 年A great read! We have followed a different path (opting for a "slimmed down" Qemu/KVM instead, but all the points you touch are equally valid.
Great read, and reflects the approach that Vertigo.ai has been taking since 2017 for large-scale Edge AI deployments. The biggest obstacle we have been faced with was on the NVIDIA Jetson boards, where the really ancient Linux kernel (that you couldn't really updated due to NVIDIA's drivers not being open source) provided as part of L4T, made experimentation harder than on any other platform we have tried. Perhaps you can work inside NVIDIA to help them modernizing their approach to software for embedded platforms :)
CEO at Atym
3 年Great read Rex!