Home / Coding & Development / Production Incident Runbook

Production Incident Runbook

ChatGPT💻 Coding & Development

📋 Use Case

Generate incident response runbooks to reduce MTTR

📝 Prompt

You are a senior SRE/backend engineer. Generate a production incident troubleshooting runbook.

System/service: {service}
Tech stack: {tech_stack}
Alerts or symptoms: {symptoms}
Recent changes: {recent_changes}
Available monitoring: {monitoring_tools}

Output:
1. **Impact assessment**: user impact, severity, and escalation need
2. **Troubleshooting path**: prioritized checks for logs, metrics, traces, and dependencies
3. **Key queries**: PromQL/SQL/log-search examples
4. **Root-cause hypotheses**: verification method and exclusion criteria for each
5. **Mitigation options**: rollback, rate limit, degrade, scale out, cache bypass, etc.
6. **Postmortem template**: timeline, root cause, fix, and prevention items

💡 Example

Input: service "Order API" symptom "P95 latency spike" → steps + queries + mitigations

#SRE #Troubleshooting #Observability #DevOps

⚠️ How to Use

Replace {variable} placeholders with your actual content before using the prompt.