MaisonBisson

a bunch of stuff I would have emailed you about

Rollback buttons and time machines

Adding a rollback button is not a neutral design choice. It affects the code that gets pushed. If developers incorrectly believe that their mistakes can be quickly reversed, they will tend to take more foolish risks. […]

Mounting a rollback button within easy reach […] means that it’s more likely to be pressed carelessly in an emergency. Panic buttons are for when you’re panicking.

From Dan McKinley, speaking about the complications and near impossibility of rolling back a deployment.

Shooting down Star Wars as a vehicle for exploring human relationships with future technologies

Into the ongoing fight between those who dismiss Star Wars as a shallow space opera vs. those who who would elevate the movies to a position of broader significance (those who prefer hard science fiction) strolls Jeremy Hsu, who points out:

Regardless of writer-director Rian Johnson’s intentions for “The Last Jedi,” his story transformed the adorable robotic sidekick into a murder droid with a will of its own. That would normally have huge implications in a science fiction story that wants to seriously explore a coherent and logical futuristic world setting. But like most Star Wars filmmakers, Johnson generally seems satisfied with merely creating an illusion of familiar technology that delivers cool visual storytelling, even if that leaves some of the bigger questions on the table.

Insert mic drop emoji here, I guess.

Common root causes of intra data center network incidents at Facebook from 2011 to 2018

From A Large Scale Study of Data Center Network Reliability by Justin Meza, Tianyin Xu, Kaushik Veeraraghavan, and Onur Mutlu, the categorized root causes of intra data center incidents at Fabook from 2011 to 2018:

CategoryFractionDescription
Maintenance17%Routine maintenance (for example, upgrading the software and firmware of network devices).
Hardware13%Failing devices (for example, faulty memory modules, processors, and ports).
Misconfiguration13%Incorrect or unintended configurations (for example, routing rules blocking production traffic).
Bug12%Logical errors in network device software or firmware.
Accidents11%Unintended actions (for example, disconnecting or power cycling the wrong network device).
Capacity planning5%High load due to insufficient capacity planning.
Undetermined29%Inconclusive root cause.

Two notes worth considering:

We use “failures” to refer to any network device misbehavior. The root cause of a failure includes not only hardware faults, but also misconfigurations, maintenance mistakes, firmware bugs, and other issues.

And:

We use Govindan et al.’s definition of root cause: “A failure event’s root-cause is one that, if it had not occurred, the failure event would not have manifested.”