Practical incident management guides

An opinionated survival guide written by engineers for engineers.

Introduction

What is Interrupit

We started writing Interrupit to help teams to get started with incident management, the gold standard for the most part is Google's SRE handbook, but that sets a high bar. More often teams need something fast OR slightly less heavyweight, Google have 100s of SREs within their organisation, most of us lack even one full-time SRE let alone teams of them.

This Interrupit is focused on an opinionated practical set of solutions some of them might not work in your current team, feel to tailor aspects of this guide that enable & help your teams. Like any good exercise program we try to give examples across three levels where practical - beginner, intermediate and advanced practitioners. The idea behind these levels, if to allow your team to step into certain behaviors & processes without a sense of failure or being overwhelmed.

If this is your teams first foray into incident management, the absolute best place to start is with a postmortem. Failure will always be a place for learning, it is also a good place to start in building trust.

Getting Started

Why do companies struggle with incident management

Advanced Guides

Understanding Human Behaviour.

Core Concepts

Understanding how incident occur

How to Contribute

How can you contribute to furthering this guide.

You should know!

This guide isn't perfect OR complete - if you see a mistake, raise an issue OR create a PR for someone in the community to review.


Getting help

Reach out to support@interrupit.com or search over the Interrupit GitHub repo.

Submit an issue

Either reach our to issues@interrupit.com or submit a issue via Interrupit Issues.

Join the community

Subscribe to the Interrupit GitHub repo.