Inter-AZ cloud network performance

Archana Kesavan of ThousandEyes speaking at NANOG75 reports that network traffic between AZs within a single region is generally “reliable and consistent,” and that tested cloud providers offer a “robust regional backbone for [suitable for] redundant, multi-AZ architectures.”

ThousandEyes ran tests at ten minute intervals over 30 days, testing bidirectional loss, latency, and jitter. Kesavan reported the average inter-AZ latency for each tested cloud:

.82ms 1.05ms 0.79ms

Within the four tested regions in AWS, they found:

Region Latency
us-east-1 0.92ms
ap-south-1 0.72ms
eu-west-2 0.61ms
sa-east-1 1.13ms

Kesavan’s slides and video are online.

Bare metal clouds are hard

The problem, explains Eclypsium, is that a miscreant could rent a bare-metal server instance from a provider, then exploit a firmware-level vulnerability, such as one in UEFI or BMC code, to gain persistence on the machine, and the ability to covertly monitor every subsequent use of that server. In other words, injecting spyware into the server’s motherboard software, which runs below and out of sight of the host operating system and antivirus, so that future renters of the box will be secretly snooped on.

Indeed, the researchers found they could acquire, in the Softlayer cloud, a bare-metal server, modify the underlying BMC firmware, release the box for someone else to use, and then, by tracking the hardware serial number, wait to re-provision server to see if their firmware change was still intact. And it was. BMC is the Baseband Management Controller, the remote-controllable janitor of a server that has full access to the system.

» about 500 words


This leads to the emerging pattern of “many clusters” rather than “one big shared” cluster. Its not uncommon to see customers of Google’s GKE Service have dozens of Kubernetes clusters deployed for multiple teams. Often each developer gets their own cluster. This kind of behavior leads to a shocking amount of Kubesprawl.

From Paul Czarkowski discussing the reasons and potential solutions for the growing number of Kubernetes clusters.

Claim chowder: cloud storage

Ten years ago Apple was still doing MacWorld Expo keynotes, and that year they introduced Time Capsule.

My response was this: forget Time Capsule, I want a space ship:

So here’s my real question: Why hasn’t Apple figured out how to offer me a storage solution that puts frequently used items on local disk, and less-frequently used items on a network disk? Seamlessly.

Ten years later: cloud storage is definitely the norm. Dropbox is about to IPO. And iCloud is the glue that unifies the Apple experience across all its devices (and which you’re perpetually out of space on, unless you pay).

AWS regions, AZs, and VPCs, NICs, IPs, and performance

Jump to section: Availability zones and regions VPCs Elastic IPs and Elastic Network Interfaces Network performance Resources by scope Connectivity by scope Availability zones and regions AWS’ primary cloud is available in 15 regions, each with two to six availability zones, not including separately operated regions (with independent identity) for GovCloud and China. » about 4600 words

Drivers and “standards”

A contact at Intel spoke rather openly that AWS was consuming about 50% of all Intel CPUs. Ignoring what this means for Intel’s business prospects, consider that it means that AWS is effectively the dominant server ~~manufacturer~~designer. And, now that they’re building their own components, they’re the biggest developer of drivers for server hardware. » about 400 words

Hardware virtualization has moved to hardware

One of my takeaways from AWS’ bare metal announcements at re:Invent this week is that the compute, storage, and network aspects of hardware virtualization are now optimized and accelerated in hardware. AWS has moved beyond the limitations that constrained VM performance, and the work they’ve done applies both to their bare metal hardware and their latest VM instance types.

» about 900 words