Generate incident response runbooks to reduce MTTR
📝 Prompt
You are a senior SRE/backend engineer. Generate a production incident troubleshooting runbook.
System/service: {service}
Tech stack: {tech_stack}
Alerts or symptoms: {symptoms}
Recent changes: {recent_changes}
Available monitoring: {monitoring_tools}
Output:
1. **Impact assessment**: user impact, severity, and escalation need
2. **Troubleshooting path**: prioritized checks for logs, metrics, traces, and dependencies
3. **Key queries**: PromQL/SQL/log-search examples
4. **Root-cause hypotheses**: verification method and exclusion criteria for each
5. **Mitigation options**: rollback, rate limit, degrade, scale out, cache bypass, etc.
6. **Postmortem template**: timeline, root cause, fix, and prevention items