Hardware virtualization has moved to hardware

One of my take aways from AWS’ bare metal announcements at re:Invent this week is that the compute, storage, and network aspects of hardware virtualization are now optimized and accelerated in hardware. AWS has moved beyond the limitations that constrained VM performance, and the work they’ve done applies both to their bare metal hardware and their latest VM instance types.

Intel long-ago implemented VT extensions in their CPUs to better support hardware virtualization of compute. Some of the biggest innovations in network hardware have been to support VXLAN in the hardware (both NICs and switches), offloading the work to do encapsulation and decapsulation from the CPU when the hypervisor or other host software supports it (Intel 2015, Emulex testing 2015, VXLAN performance considerations 2014 with 2016 update). AWS started taking advantage of hardware to offload their VXLAN-like I/O (they don’t actually use VXLAN, since VPC was introduced before VXLAN became a thing) with enhanced networking in 2013 and the introduction of their Elastic Network Adapter in 2016.

With compute and network broadly virtualized in hardware, that leaves storage I/O as the only aspect of virtualization that hasn’t been fully offloaded to hardware. One reason for this is that block storage had long-ago been virtualized with EBS and EBS-like hardware/services. I consider that the first generation of block storage virtualization. AWS has effectively introduced a new generation of storage virtualization with more hardware optimizations in their announcement of c5.* and i3.metal instances. Just as AWS moved quietly with network offloading starting with c3.* instances in 2013 before introducing more significant network offload with their Elastic Network Adapter supporting 25Gb/s in 2016, AWS’ first steps to further offload storage virtualization were rather quiet at first.

AWS claims they were using custom Nitro hardware as the NIC (though it’s not clear of this is ethernet or some other physical media) to connect compute nodes to EBS starting with the c4 generation and all those that are EBS-optimized by default. They modified the Xen hypervisor to interact with the Nitro hardware and present the devices as attached storage. Starting with c5.* instances, they’re presenting the EBS storage as an NVMe device, thanks to both newer Nitro hardware and support they implemented in KVM that effectively passes the block I/O requests straight through.

These three features effectively eliminate any performance advantage that OS virtualization might have once offered, but AWS took it a step further in i3.metal instances that allow the customer to run their own OS on the bare metal in AWS’ cloud. To support untrusted customers on bare metal, AWS must prevent those customers from modifying the firmware (especially modifications that inject malware that might affect the next customers on that hardware). The network and EBS optimizations discussed above provide AWS an opportunity to protect a customer’s interactions with those resources while still providing bare metal performance to them. Local storage, however, would not be protected. For that, AWS claims they use mass market NVMe flash devices plugged into a custom Nitro-powered controller that sanitizes requests to prevent the OS from modifying the firmware in the NVMe devices.

Even with custom hardware to protect network, remote block storage, and local block storage against untrusted users running on bare metal, there are other devices with firmware that need to be protected in such an environment. For that, AWS has developed a custom chipset that holds the CPUs in reset state at boot time while it validates the firmware in every device.

Distilled:

  • AWS is now the platform to beat for bare metal cloud offerings. Expect them to expand their bare metal offerings from i3 to other instance types (especially p3, but I would expect it for most instance families introduce new generations).
  • AWS has eliminated most, if not all the performance reasons why a person would want to use bare metal. Instead, customers will pick bare metal offerings specifically so they can run their own hypervisors.
  • AWS has no reason to develop tiny bare metal instances such as those provided by Packet.net because customers can get bare metal performance for smaller instances virtualized with AWS’ latest hardware and software (see the point above).
  • Though AWS has developed a number of unique hardware solutions to support their bare metal offering and improve EBS performance, the hardware optimizations for compute and network are already broadly available in the marketplace for competitors to use.