This is Part 1 of a 3-part series on MIM-Runbook, an open-source Claude Cowork plugin that generates complete Major Incident Management runbooks from simple YAML files. Part 2 covers installation and setup. Part 3 walks through a real example end-to-end.
The Problem Every On-Call Engineer Knows Too Well
It’s 3 AM. PagerDuty fires a P1 alert: your core database cluster is down. Revenue is bleeding at six figures per minute. Customers are tweeting. Your VP is calling.
And the junior engineer who just picked up the page is staring at a blank screen, wondering: What do I do first?
This is the reality of major incident management at most organizations. The knowledge of how to respond lives in the heads of two or three senior engineers. When they’re unavailable — on vacation, asleep, in a different timezone — the response degrades. Communication is ad-hoc. Escalation timing is inconsistent. Post-incident reviews reveal the same process failures over and over again: nobody knew who to call, the status email went out 45 minutes late, the escalation happened too slowly, and the post-incident review couldn’t reconstruct the timeline because nobody documented anything during the chaos.
MIM-Runbook was built to solve exactly this problem.
What Is MIM-Runbook?
MIM-Runbook is an open-source plugin for Claude Cowork (and Claude Code) that transforms structured incident data into a complete, prescriptive Major Incident Management runbook. You provide two YAML files — one describing the incident, one listing your stakeholders — and the plugin generates three production-ready output files in seconds:
- A Markdown file (.md) — Git-diffable, version-controllable, perfect for internal wikis and knowledge bases
- A Word document (.docx) — Formatted with a title page, severity badge, headers and footers — ready to attach to your ServiceNow ticket or share in email
- An Excel workbook (.xlsx) — A live action tracker with four sheets: Action Items, Escalation Log, Incident Timeline, and Summary Dashboard — designed to be used live on the bridge call during the incident
The runbook isn’t a generic template with “[INSERT YOUR DATA HERE]” placeholders. It’s a context-aware, prescriptive playbook generated from your incident details and your team’s contact information. Every command references your actual CI hostname. Every email template is pre-filled with your stakeholders’ real email addresses. Every escalation threshold includes the right phone number for the right person at the right time.
The flow is simple:
You provide two YAML files: incident.yaml ← incident details (number, service, CI, impact) stakeholders.yaml ← contacts with roles and escalation levels/generate-runbookThree output files: RB-INC0078342-2026-02-26.md ← Markdown (Git-diffable) RB-INC0078342-2026-02-26.docx ← Word doc (share with team) RB-INC0078342-2026-02-26.xlsx ← Excel tracker (use live during incident)
The 8 Sections Every Runbook Contains
Each generated runbook follows a battle-tested structure of 8 mandatory sections. This isn’t arbitrary — it mirrors the lifecycle of a major incident from detection to post-incident review.
Section 1: Incident Summary Banner — A quick-reference table with every key field from the incident: number, severity, priority, affected service, CI, environment, region, business impact, and the Incident Commander’s name. This is what you pin to the top of the Slack channel so everyone joining the response has instant context.
Section 2: Immediate Triage Checklist (T+0 to T+5) — Six numbered steps that walk the responder through the first five critical minutes: confirm the alert is genuine (with specific commands to run), check the monitoring dashboard, assess blast radius (with three specific questions to answer), open the Slack channel (with an exact message to copy-paste), join the Zoom bridge (with exact words to say when you arrive), and page the immediate-notify stakeholders. A junior engineer can follow these steps without any prior incident experience.
Section 3: Communication Plan — This is where most organizations fall apart during an incident. MIM-Runbook generates a stakeholder role map showing who to notify, when, and how — plus six pre-filled email templates covering the full incident lifecycle. Every template has the correct To/CC addresses, subject line, and body pre-populated with real data from your YAML files.
Section 4: Diagnosis & Investigation Steps — Category-specific investigation procedures. The plugin routes to the right diagnostic playbook based on the incident category: Database, Network, Application, Cloud/Infra, Security, or a generic fallback. Each step includes exact commands, expected vs. bad results, and decision trees for what to do next.
Section 5: Containment & Mitigation Actions — Actionable containment procedures with full rollback instructions. High-risk or irreversible actions are flagged with a CAB (Change Advisory Board) emergency approval requirement, including who to call and what to say.
Section 6: Escalation Matrix — Two tables: a time-based escalation ladder showing who to call at T+15, T+30, T+60, and T+120 if the incident isn’t resolved, plus a full individual contact table with every stakeholder’s phone, email, and Slack handle.
Section 7: Resolution & Validation — A checklist of criteria that must all pass before the incident can be declared resolved. Includes a severity downgrade decision table, a bridge close procedure with exact words to say, and instructions to send the “Service Restored” email template.
Section 8: Post-Incident Handoff — Everything needed to close the loop properly: a documentation checklist, a ServiceNow resolution notes template, a PIR ticket template, a closure checklist, and a reminder to schedule the post-incident review within 5 business days.
The Business Value
For IT leadership, MIM-Runbook delivers measurable improvements across the metrics that matter most during incident response.
Reduced Mean Time to Resolution (MTTR)
When responders have a prescriptive playbook from minute zero, they don’t waste the first 15 minutes figuring out who to call or what to check. Teams using structured runbooks report 20-40% MTTR reductions on Sev1 incidents because the critical first 30 minutes are no longer spent on process — they’re spent on the problem.
Consistent Response Quality
The runbook ensures that a 3 AM response from a junior engineer follows the same process as a weekday response from your most senior SRE. Every step is documented. Every escalation threshold is defined. Every communication goes to the right people at the right time. No tribal knowledge required.
Audit-Ready Communication Trails
The six email templates create a time-stamped communication trail that satisfies compliance requirements. For organizations subject to SOX, PCI-DSS, HIPAA, or other regulatory frameworks, this is no longer a nice-to-have — it’s a requirement. MIM-Runbook builds it into the process by default.
Faster Onboarding
New team members can participate in incident response from day one. The runbook tells them exactly what to do, what to say, and who to call — down to the exact words to speak when joining the bridge call. This is particularly valuable for organizations scaling their SRE teams or operating follow-the-sun on-call rotations across time zones.
Reduced Burnout
When incident response is systematized, the emotional and cognitive load drops significantly. The runbook has already made those decisions for them. This is one of the underappreciated benefits of structured incident management: it makes being on-call less stressful, which reduces burnout and improves retention.
How It Fits into Your Existing Workflow
MIM-Runbook doesn’t replace your incident management tools — it integrates with them. Today (Phase 1), it works with YAML files you provide. This means you can start using it immediately without any infrastructure changes or API integrations.
The planned Phase 2 adds direct ServiceNow integration: fetch a live incident by number, generate the runbook, and post it back to the ticket as work notes — all with a single /snow-runbook INC0078342 command.
The plugin architecture also supports optional connectors for PagerDuty, Slack, Datadog, AWS, Confluence, and more. Each integration is additive — you start with YAML and layer on live integrations as your maturity grows.
What’s Next
In Part 2 of this series, we’ll walk through the complete developer setup: cloning the repo, installing dependencies, configuring the MCP server, and generating your first runbook.
In Part 3, we’ll take a real example — a multi-region BGP network outage — and trace it from input YAML through to the generated output, examining every section of the runbook.
The repository is open source under the MIT license at github.com/agentbee0/MIM-Runbook.
Leave a comment