From Incident Response to Modern Incident Management: A DevOps Workshop with J. Paul Reed


From Incident Response to Modern Incident Management

Develop. Deploy. Code catches fire. FIGHT THE FIRE! Move on. Repeat. Does this cycle feel familiar? If so, this workshop is for you. We’ll take a deep dive into modern operational incidents patterns and concepts, look at what the largest, most successful software development and operations organizations do in the midst of, after, and before an incident.

We’ll discuss the history of operational incidents and delve into how to transform how you and your team respond to incidents. We’ll expand how we think about incidents from past when the fire is put out, and even before that first monitoring alarm goes off. We’ll figure out how to start changing your company culture to treat failures as opportunities to learn and improve… and then methods to actually implement that learning and improvement in your systems.

When the day is over, practitioners will have a better understanding of how to model operational incidents, conduct actionable retrospectives with their teams, discern more valuable remediation items, and truly put incidents to rest.

Participants will learn:

  • What naval carriers, nuclear power plants, and air traffic control have to do with IT and large-scale Internet operations.
  • Methods and strategies to change how your organization thinks about failure (and learning from it!)
  • How to model the full life cycle of incidents, beyond just responding to alerts and putting fires out
  • How Google, Amazon, and Netflix treat incidents and why they have so few of them!

What should I bring with me to the workshop?

Please just bring yourselves.

Are there any pre-requisites?

Feel free to bring incident war-stories to share!