An open-source Claude Cowork plugin that generates prescriptive, 8-section Major Incident Management runbooks from two simple YAML files — complete with email templates, escalation matrices, and a live Excel action tracker.
Part 1: Why MIM-Runbook Exists — and What It Means for Your Business
The Problem Every On-Call Engineer Knows Too Well
It’s 3 AM. PagerDuty fires a P1 alert: your core database cluster is down. Revenue is bleeding at six figures per minute. Customers are tweeting. Your VP is calling.
And the junior engineer who just picked up the page is staring at a blank screen, wondering: What do I do first?
This is the reality of major incident management at most organizations. The knowledge of how to respond lives in the heads of two or three senior engineers. When they’re unavailable, the response degrades. Communication is ad-hoc. Escalation timing is inconsistent. Post-incident reviews reveal the same process failures over and over again.
MIM-Runbook was built to solve exactly this problem.
What Is MIM-Runbook?
MIM-Runbook is an open-source plugin for Claude Cowork (and Claude Code) that transforms structured incident data into a complete, prescriptive Major Incident Management runbook. You provide two YAML files — one describing the incident, one listing your stakeholders — and the plugin generates three production-ready output files:
- A Markdown file (.md) — Git-diffable, version-controllable, perfect for internal wikis
- A Word document (.docx) — Formatted with a title page, severity badge, headers and footers — ready to attach to your ServiceNow ticket or share in email
- An Excel workbook (.xlsx) — A live action tracker with four sheets: Action Items, Escalation Log, Incident Timeline, and Summary Dashboard
The runbook isn’t a generic template. It’s a context-aware, prescriptive playbook generated from your incident details and your team’s contact information. Every command references your actual CI name. Every email template is pre-filled with your stakeholders’ real email addresses. Every escalation threshold includes the right phone number for the right person at the right time.
The 8 Sections Every Runbook Contains
Each generated runbook follows a battle-tested structure of 8 mandatory sections:
Section 1: Incident Summary Banner — A quick-reference table with every key field from the incident: number, severity, priority, affected service, CI, environment, region, business impact, and the Incident Commander’s name. This is what you pin to the top of the Slack channel.
Section 2: Immediate Triage Checklist (T+0 to T+5) — Six numbered steps that walk the responder through the first five minutes: confirm the alert is genuine, check the monitoring dashboard, assess blast radius, open the Slack channel (with an exact message to paste), join the Zoom bridge (with exact words to say), and page the immediate-notify stakeholders.
Section 3: Communication Plan — A stakeholder role map showing who to notify, when, and how. This section also includes six pre-filled email templates covering the full incident lifecycle from initial notification to PIR invitation. Every template has the correct To/CC addresses, subject line, and body pre-populated.
Section 4: Diagnosis & Investigation Steps — Category-specific investigation procedures. The plugin routes to the right diagnostic playbook based on the incident category: Database, Network, Application, Cloud/Infra, Security, or a generic fallback. Each step includes exact commands, expected vs. bad results, and decision trees for what to do next.
Section 5: Containment & Mitigation Actions — Actionable containment procedures with rollback instructions. High-risk actions are flagged with a CAB (Change Advisory Board) approval requirement. Every action includes the expected impact, blast radius, and what to do if it fails.
Section 6: Escalation Matrix — Two tables: a time-based escalation ladder (T+15, T+30, T+60, T+120) and an individual contact table with every stakeholder’s phone, email, and Slack handle. If you’ve defined vendor escalations, those appear here too with account numbers and support URLs.
Section 7: Resolution & Validation — A checklist of criteria that must all pass before the incident can be declared resolved. Includes a severity downgrade decision table, a bridge close procedure with exact words to say, and instructions to send the “Service Restored” email.
Section 8: Post-Incident Handoff — Everything needed to close the loop: a documentation checklist, a ServiceNow resolution notes template, a PIR ticket template, and a reminder to schedule the post-incident review.
The Business Value
For IT leadership, MIM-Runbook delivers measurable improvements across the metrics that matter most:
Reduced Mean Time to Resolution (MTTR): When responders have a prescriptive playbook from minute zero, they don’t waste the first 15 minutes figuring out who to call or what to check. Early data from teams using structured runbooks shows 20-40% MTTR reductions on Sev1 incidents.
Consistent response quality: The runbook ensures that a 3 AM response from a junior engineer follows the same process as a weekday response from your most senior SRE. Every step is documented. Every escalation threshold is defined. No tribal knowledge required.
Audit-ready communication trails: The six email templates create a time-stamped communication trail that satisfies compliance requirements. Every email includes the incident number, severity, business impact, and resolution status — exactly what auditors and regulators want to see.
Faster onboarding: New team members can participate in incident response from day one. The runbook tells them exactly what to do, what to say, and who to call. Instead of shadowing for months, they have a playbook they can follow independently.
Reduced burnout: When incident response is systematized, the emotional and cognitive load drops. Engineers aren’t making high-stakes decisions under pressure with incomplete information. The runbook has already made those decisions for them.
How It Fits into Your Existing Workflow
MIM-Runbook doesn’t replace your incident management tools — it integrates with them. Today (Phase 1), it works with YAML files you provide. The planned Phase 2 adds direct ServiceNow integration: fetch a live incident by number, generate the runbook, and post it back to the ticket as work notes — all with a single command.
The plugin also supports optional connectors for PagerDuty, Slack, Datadog, AWS, and more. The architecture is designed so that each integration is additive — you start with YAML and layer on live integrations as your maturity grows.
Part 2: Step-by-Step Developer Guide — Install, Configure, and Run
This section walks you through everything you need to get MIM-Runbook running on your machine, from cloning the repo to generating your first runbook.
Prerequisites
Before you begin, make sure you have:
- Node.js 18+ (LTS recommended)
- npm (comes with Node.js)
- Claude Cowork (desktop app) or Claude Code (CLI)
- Git (to clone the repository)
Step 1: Clone the Repository
git clone https://github.com/agentbee0/MIM-Runbook.gitcd MIM-Runbook/src
The src/ directory contains the full plugin source — commands, skills, and the MCP server.
Step 2: Install MCP Server Dependencies
The plugin’s MCP server handles YAML parsing, runbook generation, and file output. It’s a TypeScript project that uses the Model Context Protocol SDK.
cd servers/runbook-generatornpm install
This installs the following key dependencies:
@modelcontextprotocol/sdk— the MCP protocol layerdocx— generates the Word document outputexceljs— generates the Excel workbook outputjs-yaml— parses your YAML input fileszod— validates YAML against the schema
Step 3: Build the MCP Server (Optional)
If you want to run the compiled JavaScript instead of using tsx:
npm run build
This compiles TypeScript to the dist/ directory. You can then start the server with npm start instead of npm run dev.
Step 4: Install the Plugin
For Claude Cowork (Desktop App):
- Open Claude Cowork
- Go to Customize > Browse plugins > Upload
- Select the
src/folder (the one containing.mcp.json,commands/,skills/, andservers/) - The plugin will appear in your plugin list
For Claude Code (CLI):
claude plugin install --path /path/to/MIM-Runbook/src
Step 5: Prepare Your Input YAML Files
You need two YAML files: one for the incident and one for stakeholders. The repository includes example files in the example/input/ directory that you can use as templates.
Create your incident YAML:
cp example/input/incident-network-outage.yaml input/my-incident.yaml
Edit input/my-incident.yaml with your actual incident data. Every field marked “Required” in the schema must be present.
Create your stakeholders YAML:
cp example/input/stakeholders-example.yaml input/my-stakeholders.yaml
Edit input/my-stakeholders.yaml with your team’s real contact information.
Step 6: Validate Your YAML (Optional but Recommended)
Before generating, validate your YAML files to catch schema errors:
/validate-yaml input/my-incident.yaml
The validator checks every required field and returns specific, field-level error messages if anything is missing or malformed.
Step 7: Generate Your Runbook
Run the generation command:
/generate-runbook input/my-incident.yaml input/my-stakeholders.yaml
Or, if your files are in the default input/ directory and named with the standard pattern, simply:
/generate-runbook
The plugin will generate three files in the output/ directory:
output/├── RB-INC0091245-2026-02-26T13-01-55.md├── RB-INC0091245-2026-02-26T13-01-55.docx└── RB-INC0091245-2026-02-26T13-01-55.xlsx
The filename format is RB-{incident_number}-{timestamp} so that multiple generations for the same incident are preserved.
Understanding the MCP Server Architecture
Under the hood, the plugin exposes four MCP tools (Phase 1):
| Tool | Purpose |
|---|---|
generate_runbook | Core generation pipeline — parses YAML, builds markdown, creates .docx and .xlsx |
load_yaml_file | Reads a YAML file from disk |
list_input_files | Lists YAML files in the input directory |
validate_incident_yaml | Validates YAML against the incident/stakeholder schema |
The MCP server configuration lives in .mcp.json at the plugin root:
{ "mcpServers": { "runbook-generator": { "command": "npx", "args": ["tsx", "servers/runbook-generator/src/index.ts"] } }}
Phase 2: ServiceNow Integration (Planned)
Phase 2 adds live ServiceNow integration with four additional tools:
fetch_snow_incident— pulls a live incident from ServiceNow by INC numberfetch_snow_stakeholders— fetches assignment group members automaticallyupdate_snow_incident— posts the generated runbook back to the ticket as work notescreate_snow_pir_ticket— creates a Post-Incident Review Problem record
To configure, create a .env file in servers/runbook-generator/:
SNOW_INSTANCE=your-company.service-now.comSNOW_AUTH_TYPE=basicSNOW_USERNAME=your-service-accountSNOW_PASSWORD=your-password
Then use the single command:
/snow-runbook INC0078342
This fetches the incident, fetches stakeholders, generates all three output files, and optionally posts back to the ticket.
Optional Connectors
The plugin architecture supports additional integrations via MCP. You can extend .mcp.json to add connectors for PagerDuty, Slack, Datadog, AWS, and more. See CONNECTORS.md in the repository for the full list and configuration instructions.
Part 3: Walkthrough — A Real Example from Input to Output
Let’s walk through a concrete example: a multi-region network outage caused by a BGP route withdrawal. We’ll examine both input YAML files and then look at what the plugin produces.
The Incident Input: incident-network-outage.yaml
This file describes a critical network outage. Here are the key fields and why they matter:
Identity and severity fields:
incident: number: "INC0091245" title: "Multi-Region Network Outage — BGP Route Withdrawal..." severity: "1 - Critical" priority: "1 - Critical" state: "In Progress"
The number field becomes the primary identifier across all three output files — in filenames, email subject lines, Slack channel names, and the runbook banner. The severity and priority drive the urgency tone throughout the document.
Categorization fields:
category: "Network"
subcategory: "BGP / Routing"
affected_service: "Global Application Network — All Customer-Facing Traffic"
affected_ci: "core-router-edge-01"
environment: "Production"
region: "global (us-east-1, eu-west-1, ap-southeast-1)"
The category field is critical — it determines which diagnostic playbook the plugin generates in Section 4. A “Network” category triggers BGP checks, traceroute commands, firewall rule verification, and CDN health checks. A “Database” category would instead generate Oracle RAC health checks, connection pool queries, and replication lag analysis.
The affected_ci value (core-router-edge-01) is injected into every command throughout the runbook, so engineers can copy-paste commands directly without editing hostnames.
Business impact:
business_impact: >
All inbound customer traffic to production services is failing.
100% packet loss from external clients to all three production
regions. Estimated 45,000 active sessions dropped. Revenue
impact ~$120K/minute. B2B SLA breach imminent for Tier-1
enterprise customers.
This text appears verbatim in the runbook banner, in every email template, and in the Slack channel opening message. Write it once, and it propagates everywhere. The business context ensures that anyone reading any communication — from the on-call engineer to the CTO — immediately understands the stakes.
Change correlation:
Setting change_related: true adds a prominent “YES — investigate recent changes” note to the runbook banner, alerting responders that a recent change may be the root cause. In this example, a BGP community configuration change was pushed 7 minutes before the outage.
The Stakeholders Input: stakeholders-example.yaml
This file defines the people involved in the incident response and their escalation levels. Here’s how the structure works:
Level 1 — Immediate response (T+0):
stakeholders: - name: "Sarah Mitchell" role: "Incident Commander" title: "Director of Site Reliability Engineering" email: "s.mitchell@company.com" phone: "+1-555-0201" slack: "@sarah.mitchell" escalation_level: 1 notify_immediately: true bridge_url: "https://company.zoom.us/j/98765432101" bridge_phone: "+1-929-205-6099 PIN: 123456#"
The Incident Commander’s bridge_url and bridge_phone are injected into every email template and the triage checklist. The notify_immediately: true flag means this person appears in the “To:” line of the first four email templates.
Level 2 — Escalation at T+30: Level 2 stakeholders (like the Customer Impact Lead and Product Lead) are CC’d on early templates and move to “To:” on templates 4-6. They’re also the escalation target at the T+30 threshold in the escalation matrix.
Level 3 — Executive escalation at T+60+: Executives (CTO, CRO) are CC-only on templates 3-6 and are the escalation target at T+60 and T+120. They’re never in the “To:” line of the initial alert — the plugin respects the escalation hierarchy by design.
Vendor escalations: Vendor contacts (Oracle Support, AWS Support) appear in Section 6 with account numbers and support URLs pre-filled. When the responder needs to open a case with a vendor at 3 AM, they don’t need to hunt for account credentials.
What the Plugin Produces
When you run /generate-runbook with these two YAML files, you get three output files. Let’s look at what’s inside each one.
The Markdown Runbook (.md)
The Markdown output is the canonical source. It opens with the Incident Summary Banner — a table with every field from the YAML — followed by the business impact statement in bold and a critical severity warning.
Section 2 (Triage Checklist) begins with a specific command: ping core-router-edge-01 — not a generic placeholder, but the actual CI from your YAML. The Slack channel is named #incinc0091245 (derived from the incident number). The bridge join script includes the Zoom URL and dial-in from the stakeholder YAML.
Section 3 (Communication Plan) contains six email templates. Template 1 (“Initial Incident Notification”) has a subject line with the severity, incident number, and affected service pre-filled. The “To:” field lists every Level 1 stakeholder’s email address. The body includes the full business impact statement, bridge URL, and IC contact information.
Section 4 (Diagnosis) routes to the Network investigation playbook because the incident category is “Network.” This means the steps include traceroute from multiple vantage points, BGP peer status checks, firewall rule verification, and CDN/load balancer health checks — each with expected good vs. bad results and decision trees pointing to the appropriate containment action in Section 5.
Section 6 (Escalation Matrix) contains a time-based table showing exactly who to call at T+15, T+30, T+60, and T+120, plus a full contact table with every stakeholder’s phone, email, and Slack handle. The vendor table includes account numbers and severity mappings pre-filled.
The Word Document (.docx)
The .docx file contains the same content as the Markdown, formatted as a professional Word document with a title page showing the incident number and severity badge, headers and footers with page numbers, and proper heading hierarchy. This is the file you attach to the ServiceNow ticket, share in email, or print for the war room.
The Excel Action Tracker (.xlsx)
The .xlsx workbook contains four sheets designed to be used live during the incident:
Sheet 1 — Action Items: Pre-populated with action items derived from the runbook steps. Each row has columns for Status (with a dropdown: Open / In Progress / Done), Owner, Due Time, and Notes. Conditional formatting colors rows: red for Open, amber for In Progress, green for Done.
Sheet 2 — Escalation Log: Pre-seeded with stakeholder contacts from the YAML. As you make escalation calls during the incident, you log them here with timestamp, method, and outcome.
Sheet 3 — Incident Timeline: Pre-seeded with known events from the incident description. During the incident, you add rows as events occur — creating the timeline you’ll need for the post-incident review.
Sheet 4 — Summary Dashboard: Live formula counters showing how many actions are open vs. done, a quick-reference table of stakeholder contacts, and vendor contact information. This is the sheet you share on the Zoom screen during the bridge call.
Customizing for Your Organization
The example YAML files are starting templates. To adapt them for your organization:
- Create a
stakeholders-production.yamlwith your real team’s contact information. Keep it updated as your on-call rotation changes. - When a Sev1 hits, copy
incident-example.yaml, fill in the fields from your ServiceNow ticket, and run/generate-runbook. - As you build confidence, consider the Phase 2 ServiceNow integration to eliminate the manual YAML step entirely.
The plugin supports six incident categories out of the box — Database, Network, Application, Cloud/Infra, Security, and a generic fallback — each with tailored diagnostic and containment procedures. If your organization has additional categories, the CONTRIBUTING.md file in the repository explains how to add them.
Getting Started Today
MIM-Runbook is open source under the MIT license. The repository is at github.com/agentbee0/MIM-Runbook.
To get started: clone the repo, install dependencies, copy the example YAML files, customize them with your incident data and team contacts, and run /generate-runbook. In under a minute, you’ll have a complete, prescriptive runbook that would have taken a senior engineer hours to write manually.
The next time a Sev1 fires at 3 AM, your team won’t be staring at a blank screen. They’ll be following a playbook.
Leave a comment