Adding a rollback button is not a neutral design choice. It affects the code that gets pushed. If developers incorrectly believe that their mistakes can be quickly reversed, they will tend to take more foolish risks. […]
Mounting a rollback button within easy reach […] means that it’s more likely to be pressed carelessly in an emergency. Panic buttons are for when you’re panicking.
In practice, we have fixed whole classes of reliability problems by forcing engineers to define deadlines in their service definitions.
In a polarized climate, opponents would jeer even eloquence from an unwelcome source; partisans would chant lovingly for public incontinence if delivered on behalf of the home team.
From Politico editor-in-chief John F. Harris, talking about Trump, but the point seems to apply far more broadly.
Into the ongoing fight between those who dismiss Star Wars as a shallow space opera vs. those who who would elevate the movies to a position of broader significance (those who prefer hard science fiction) strolls Jeremy Hsu, who points out:
Regardless of writer-director Rian Johnson’s intentions for “The Last Jedi,” his story transformed the adorable robotic sidekick into a murder droid with a will of its own. That would normally have huge implications in a science fiction story that wants to seriously explore a coherent and logical futuristic world setting. But like most Star Wars filmmakers, Johnson generally seems satisfied with merely creating an illusion of familiar technology that delivers cool visual storytelling, even if that leaves some of the bigger questions on the table.
Insert mic drop emoji here, I guess.
How do we learn from incidents, and how do we rebuild customer trust after an incident? Customer-facing postmortems are critical to this, but they have to answer the right questions. » about 700 words
PID controllers are all around us. Their straightforward, closed loop cycle of set-point, observation, and response are the basis for almost every control system in our everyday world, but the the Wikipedia article obscures their simple beauty and ubiquity. » about 700 words
[F]uzzy logic uses degrees of truth as a mathematical model of vagueness, while probability is a mathematical model of ignorance.
Those interested in how our expectations and relationship with "business tools" changes over time should consider this computer review from 1985. » about 200 words
From A Large Scale Study of Data Center Network Reliability by Justin Meza, Tianyin Xu, Kaushik Veeraraghavan, and Onur Mutlu, the categorized root causes of intra data center incidents at Fabook from 2011 to 2018:
|Maintenance||17%||Routine maintenance (for example, upgrading the software and firmware of network devices).|
|Hardware||13%||Failing devices (for example, faulty memory modules, processors, and ports).|
|Misconfiguration||13%||Incorrect or unintended configurations (for example, routing rules blocking production traffic).|
|Bug||12%||Logical errors in network device software or firmware.|
|Accidents||11%||Unintended actions (for example, disconnecting or power cycling the wrong network device).|
|Capacity planning||5%||High load due to insufficient capacity planning.|
|Undetermined||29%||Inconclusive root cause.|
Two notes worth considering:
We use “failures” to refer to any network device misbehavior. The root cause of a failure includes not only hardware faults, but also misconfigurations, maintenance mistakes, firmware bugs, and other issues.
We use Govindan et al.’s definition of root cause: “A failure event’s root-cause is one that, if it had not occurred, the failure event would not have manifested.”
WWV's 20 MHz signal was used for a unique purpose in 1958: to track the disintegration of Russian satellite Sputnik I after the craft's onboard electronics failed. » about 400 words