On Call

There are no architects at Facebook

We get there through iteration. We don’t try to build an architecture that is failproof. Building an architecture and worrying about it for months and months at a time before you actually go deploy it tends to not get us the result we want because by the time we’ve actually deployed something the problem has moved or there are more technologies available to solve different problems.

We take it seriously enough to say “there are no architects on the team.”

We do a very “you build it you own it” process, where any team or any individual or any engineer that builds or designs something, they own it, and they do the on-call for it.

On call is where we learn, and that’s how we improve over time.

You build a system…you don’t have to be perfect. Deploy it, and as long as you have enough detection and mitigation capabilities, you will do OK. And you will learn, and you will iterate over it, and you will get better over time.

From the NANOG73 keynote: “Operations first, feature second” by Facebook VP of Network Engineering Najam Ahmad. It’s at about the 10:20 mark in the video: