Hardware virtualization has moved to hardware

One of my takeaways from AWS’ bare metal announcements at re:Invent this week is that the compute, storage, and network aspects of hardware virtualization are now optimized and accelerated in hardware. AWS has moved beyond the limitations that constrained VM performance, and the work they’ve done applies both to their bare metal hardware and their latest VM instance types.

Contents:

CPU and network

Intel long-ago implemented VT extensions in their CPUs to better support hardware virtualization of compute. Some of the biggest innovations in network hardware have been to support VXLAN in the hardware (both NICs and switches), offloading the work to do encapsulation and decapsulation from the CPU when the hypervisor or other host software supports it (see Intel 2015, Emulex testing 2015, VXLAN performance considerations 2014 with 2016 update). AWS started taking advantage of hardware to offload their VXLAN-like I/O (VPC was introduced before VXLAN became a standard, and there may be some extant differences between them) with enhanced networking in 2013 and the introduction of their Elastic Network Adapter in 2016.

Storage

With compute and network broadly virtualized in hardware, that leaves storage I/O as the only aspect of virtualization that hasn’t been fully offloaded to hardware. One reason for this is that block storage had long-ago been virtualized with EBS and EBS-like hardware/services. Those should be considered the first generation of block storage virtualization. AWS has effectively introduced a new generation of storage virtualization with more hardware optimizations in their announcement of c5.* and i3.metal instances. Just as AWS moved quietly with network offloading starting with c3.* instances in 2013 before introducing more significant network offload with their Elastic Network Adapter supporting 25Gb/s in 2016, AWS’ first steps to further offload storage virtualization were rather quiet at first.

AWS claims they were using custom Nitro hardware as the NIC (though it’s not clear if this is ethernet or some other media) to connect compute nodes to EBS starting with the c4 generation and all instance types that are EBS-optimized by default. They modified the Xen hypervisor to interact with the Nitro hardware and present the devices as attached storage. Starting with c5.* instances, they’re presenting the EBS storage as an NVMe device, thanks to both newer Nitro hardware and support they implemented in KVM that effectively passes the block I/O requests straight through.

Security

These three features effectively eliminate any performance advantage that OS virtualization might have once offered, but AWS took it a step further in i3.metal instances that allow the customer to run their OS on the bare metal in AWS’ cloud. To support untrusted customers on bare metal, AWS must prevent those customers from modifying the firmware (especially modifications that inject malware that might affect the next customers on that hardware). The network and EBS optimizations discussed above provide AWS an opportunity to protect a customer’s interactions with those resources while still providing bare-metal performance to them. Local storage, however, would not be protected. For that, AWS claims they use mass-market NVMe flash devices plugged into a custom Nitro-powered controller that sanitizes requests to prevent the OS from modifying the firmware in the NVMe devices.

Even with custom hardware to protect the network, remote block storage, and local block storage firmware against untrusted users running on bare metal, there are other devices with firmware that need to be protected in such an environment. For that, AWS has developed a custom chipset that holds the CPUs in a reset state at boot time while it validates the firmware in every device.

Oversubscription

An aside from the technical details, it should be noted that bare metal instances give some insight into AWS’ oversubscription policies: notably, it doesn’t appear they are doing any oversubscription of CPU, memory, or local storage, at least for the i3.metal compared to the other i3.* instances.

Distilled

  • AWS is now the platform to beat for bare metal cloud offerings. Expect them to expand their-bare metal offerings from i3 to other instance types (especially p3, but I would expect it for most instance families introduce new generations).
  • AWS has eliminated most, if not all the performance reasons why a person would want to use bare-metal. Instead, customers will pick bare-metal offerings specifically so they can run their own hypervisors.
  • AWS has few or no reasons to develop tiny bare metal instances such as those provided by Packet.net because customers can get bare-metal performance for smaller instances virtualized with AWS’ latest hardware and software (see the point above).
  • Though AWS has developed many unique hardware solutions to support their bare metal offering and improve EBS performance, the hardware optimizations for compute and network are already broadly available in the marketplace for competitors to use.
  • The bare metal instances give new insight into AWS’ server BOMs and show that AWS does not appear to be oversubscribing resources on the BOM.