Incident Response

Run incident response: triage, communicate, resolve, and write a blameless postmortem.

incident reliability postmortem on-call

When to use

Trigger with 'we have an incident', 'production is down', an alert needing severity assessment, a status update mid-incident, or when writing a blameless postmortem after resolution. Guides triage, communication, mitigation, and retrospective.

Examples

Triage an active incident

Assess severity and coordinate initial response

We have an incident: checkout is failing for ~30% of users. Error rate spiked 10 minutes ago. Help me triage.

Draft a status update

Write a customer-facing or internal incident communication

Write a status page update. We identified a DB connection pool exhaustion issue, a fix is deployed, monitoring recovery.

Write a postmortem

Document the incident, root cause, and action items

Write a blameless postmortem for yesterday's 45-minute outage. Root cause was a missing index on the orders table after migration.