Software Engineering

Common root causes of intra data center network incidents at Facebook from 2011 to 2018

From A Large Scale Study of Data Center Network Reliability by Justin Meza, Tianyin Xu, Kaushik Veeraraghavan, and Onur Mutlu, the categorized root causes of intra data center incidents at Fabook from 2011 to 2018:

Maintenance17%Routine maintenance (for example, upgrading the software and firmware of network devices).
Hardware13%Failing devices (for example, faulty memory modules, processors, and ports).
Misconfiguration13%Incorrect or unintended configurations (for example, routing rules blocking production traffic).
Bug12%Logical errors in network device software or firmware.
Accidents11%Unintended actions (for example, disconnecting or power cycling the wrong network device).
Capacity planning5%High load due to insufficient capacity planning.
Undetermined29%Inconclusive root cause.

Two notes worth considering:

We use “failures” to refer to any network device misbehavior. The root cause of a failure includes not only hardware faults, but also misconfigurations, maintenance mistakes, firmware bugs, and other issues.


We use Govindan et al.’s definition of root cause: “A failure event’s root-cause is one that, if it had not occurred, the failure event would not have manifested.”

Time synchronization is rough

CloudFlare on the frustrations of clock skew:

It may surprise you to learn that, in practice, clients’ clocks are heavily skewed. A recent study of Chrome users showed that a significant fraction of reported TLS-certificate errors are caused by client-clock skew. During the period in which error reports were collected, 6.7% of client-reported times were behind by more than 24 hours. (0.05% were ahead by more than 24 hours.) This skew was a causal factor for at least 33.5% of the sampled reports from Windows users, 8.71% from Mac OS, 8.46% from Android, and 1.72% from Chrome OS.

They’re proposing Roughtime as a solution.

git foo

A few git commands I find myself having to look up:

Resolve Git merge conflicts in favor of their changes during a pull:

git pull -Xtheirs
git checkout --theirs the/conflicted.file


Viewing Unpushed Git Commits

git log origin/master..HEAD

You can also view the diff using the same syntax:

git diff origin/master..HEAD

Or, “for a little extra awesomeness”

git log --stat origin/master..HEAD 

Updated since it was first posted:

Starting with Git 2.5+ (Q2 2015), the actual answer would be git log @{push}… See that new shortcut @{push}


Outgoing changes: git log @{u}.. Incoming changes: git log ..@{u}

@{u} or @{upstream} means the upstream branch of the current branch (see git rev-parse --help or git help revisions for details).