humanrail-escalation

Name: humanrail-escalation
Author: automateyournetwork/netclaw

$npx mdskill add automateyournetwork/netclaw/humanrail-escalation

Routes ambiguous or high-risk agent decisions to human engineers for verification

Solves problems requiring human judgment in uncertain or destructive scenarios
Uses HumanRail API and Lightning Network for task routing and worker payments
Triggers escalation based on low confidence, pre-destructive actions, or ambiguous tickets
Returns verified human responses as structured output to the agent

SKILL.md

.github/skills/humanrail-escalationView on GitHub ↗

---
name: humanrail-escalation
description: "Human-in-the-loop escalation via HumanRail — route low-confidence agent decisions, pre-destructive operation approvals, and ambiguous incident tickets to real human engineers. Human answers are verified and returned as structured output. Workers are paid via Lightning Network. Use when the agent is uncertain, when a destructive change needs explicit human sign-off beyond a ServiceNow CR, or when an ambiguous ticket requires human triage before automated handling."
license: Apache-2.0
user-invokable: true
metadata:
  { "openclaw": { "requires": { "env": ["HUMANRAIL_API_KEY", "HUMANRAIL_MCP_URL"] } } }
---

# HumanRail Escalation

## Safety — Non-Negotiable Rules

**NEVER route tasks that expose internal tooling or infrastructure details to workers.** HumanRail workers are external humans. Task descriptions must be neutral: describe the decision needed without referencing agent names, internal hostnames, file paths, or credentials. Frame the task as a domain question a knowledgeable engineer would answer — not as "an AI agent needs help."

**NEVER use HumanRail as a substitute for ServiceNow Change Management.** A HumanRail approval is an engineering judgment call, not a formal ITSM change record. Pre-destructive approvals via HumanRail are a second safety layer — the ServiceNow CR must still exist and be in `Implement` state before any config push.

**NEVER expose PII, device credentials, or configuration secrets in task payloads.** Sanitize payloads: use device roles ("core router") not hostnames, use interface descriptions not IP addresses where possible.

**ALWAYS wait for `verified` status before acting.** A task with status `submitted` has been answered by a worker but not yet verified. Only act on the output when `status == "verified"`.

## When to Use

| Situation | Use HumanRail | Use ServiceNow CR | Use Slack Escalation |
|-----------|:-------------:|:-----------------:|:--------------------:|
| Agent confidence low — multiple valid interpretations | ✓ | — | — |
| Pre-destructive op beyond normal ITSM gating | ✓ | Required (both) | — |
| Ambiguous incident ticket needs triage | ✓ | — | Notify after |
| P1/P2 in-flight, need immediate human judgment | ✓ | — | ✓ (both) |
| Standard config change with approved CR | — | ✓ | — |
| Routine maintenance window | — | ✓ | — |
| Auto-quarantine endpoint (ISE) | — | — | ✓ (ise-incident-response) |

## MCP Server

| Property | Value |
|----------|-------|
| **Source** | [prime001/humanrail-mcp-server](https://github.com/prime001/humanrail-mcp-server) |
| **Transport** | Streamable HTTP (FastMCP, default port 8100) |
| **Language** | Python 3.10+ |
| **Tools** | 7 (create_task, get_task, wait_for_task, cancel_task, list_tasks, get_usage, health_check) |
| **Auth** | `HUMANRAIL_API_KEY` (Bearer token — `ek_live_...` or `ek_test_...`) |
| **Public endpoint** | `https://humanrail.dev/mcp` (no local install required) |

## How to Run (Local)

```bash
# Install dependencies
pip3 install "mcp[cli]>=1.0.0" httpx

# Start the MCP server (runs on http://127.0.0.1:8100/mcp)
HUMANRAIL_API_KEY=ek_live_your_key python3 $HUMANRAIL_MCP_SCRIPT

# Or use the public endpoint directly (no local install needed):
# Set HUMANRAIL_MCP_URL=https://humanrail.dev/mcp in ~/.openclaw/.env
```

Get your free API key at [humanrail.dev](https://humanrail.dev).

## Available Tools

| Tool | Parameters | What It Does |
|------|-----------|-------------|
| `create_task` | `task_type`, `payload`, `output_schema`, `risk_tier?`, `sla_seconds?`, `payout_currency?`, `payout_max_amount?`, `callback_url?`, `metadata?` | Post a task to the human worker pool and return a task ID |
| `get_task` | `task_id` | Get current status and output of a task |
| `wait_for_task` | `task_id`, `poll_interval_seconds?`, `timeout_seconds?` | Block until task reaches terminal state (`verified`, `failed`, `cancelled`, `expired`) |
| `cancel_task` | `task_id` | Cancel a non-terminal task |
| `list_tasks` | `status?`, `task_type?`, `limit?`, `created_after?`, `created_before?` | List tasks with optional filters |
| `get_usage` | — | Org-level usage stats and billing summary |
| `health_check` | — | Verify the HumanRail API is reachable |

---

## Workflow 1: Low-Confidence Decision Escalation

When the agent reaches a decision point where two or more interpretations are equally valid and the wrong choice has significant consequences:

### Phase 1: Assess Uncertainty

Identify what the agent cannot determine autonomously:
- Does the context admit multiple valid network design choices?
- Would an experienced engineer need site-specific knowledge the agent doesn't have?
- Is the downstream impact large enough that getting it wrong matters?

If confidence is high enough to proceed, do so — don't over-escalate. HumanRail has an SLA cost.

### Phase 2: Frame the Decision Neutrally

Write the task payload without referencing agent internals. The worker is an engineer; give them the facts they need.

```
task_type: "technical_decision"
payload:
  title:       "BGP policy clarification needed"
  description: |
    A network change is in progress on a dual-homed edge router.
    The router has two upstream transit providers. Policy options:
      A) Prefer Provider-A for all prefixes (lower latency, higher cost)
      B) Prefer Provider-B for all prefixes (higher latency, lower cost)
      C) Split: prefer Provider-A for /24s and shorter, Provider-B for longer prefixes
    Current traffic: ~8 Gbps mixed. No documented preference in the change ticket.
    What is the recommended policy and why?
  context:     "Standard dual-homed BGP edge, no MPLS, IXP peering not in scope."
output_schema:
  type: object
  required: [recommendation, rationale]
  properties:
    recommendation: {type: string, enum: [A, B, C]}
    rationale:      {type: string, minLength: 20}
risk_tier:     "medium"
sla_seconds:   900
payout_currency: "USD"
payout_max_amount: 1.00
```

### Phase 3: Create and Wait

```
task = create_task(...)
result = wait_for_task(task_id=task["id"], timeout_seconds=900)
```

### Phase 4: Act on Verified Output

Only proceed when `result["status"] == "verified"`. The `result["output"]` field contains the worker's structured response matching your `output_schema`.

---

### *** HUMAN DECISION IN FLIGHT ***

While `wait_for_task` polls, **do not proceed with the ambiguous action**. Post a Slack status update:

> "⏳ Waiting for human engineer input on [topic]. Task ID: [id]. ETA: [sla_seconds]s."

If the task expires or fails, fall back to the conservative option and create a ServiceNow incident for manual follow-up.

---

## Workflow 2: Pre-Destructive Operation Approval

For operations the agent is authorized to perform but that are irreversible or high-blast-radius — beyond the standard ServiceNow CR gate:

**Examples:** device reload, `write erase`, BGP peer teardown affecting production traffic, ISE policy purge.

### Phase 1: Baseline and Summarize

Before asking for approval, gather the current state so the human reviewer has complete context.

```bash
# Capture pre-change state first
python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" get_interface_brief '{"device":"core-rtr-01"}'
python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" get_bgp_summary '{"device":"core-rtr-01"}'
```

### Phase 2: Build Approval Request

```
task_type: "change_approval"
payload:
  title:       "Production reload approval — core edge router"
  description: |
    A device reload is requested to apply a software upgrade from IOS-XE 17.9 to 17.12.
    Pre-change state:
      - BGP peers: 4 active (2 transit, 2 iBGP) — all ESTABLISHED
      - Active sessions: 847 flows
      - Estimated downtime: 3-5 minutes
      - Rollback: IOS-XE 17.9 image retained in flash
    ServiceNow CR: [CR number] — in Implement state.
    Should this reload proceed NOW or be deferred to the next maintenance window?
  urgency:     "high"
output_schema:
  type: object
  required: [decision, notes]
  properties:
    decision: {type: string, enum: [proceed, defer]}
    notes:    {type: string}
risk_tier:     "high"
sla_seconds:   300
payout_currency: "USD"
payout_max_amount: 2.00
```

### Phase 3: Gate on Response

```
result = wait_for_task(task_id=task["id"], timeout_seconds=300)
if result["output"]["decision"] == "proceed":
    # Execute the destructive operation
else:
    # Defer — log in GAIT, update ServiceNow CR
```

---

### *** HUMAN APPROVAL GATE ***

**STOP.** The destructive operation MUST NOT execute until `result["status"] == "verified"` AND `result["output"]["decision"] == "proceed"`.

If `wait_for_task` times out: default to DEFER. Update the ServiceNow CR with status "awaiting human authorization" and notify the Slack incident channel.

---

## Workflow 3: Ambiguous Incident Ticket Triage

When an inbound ServiceNow incident has insufficient information to determine severity, affected CIs, or the correct response team:

### Phase 1: Extract What's Known

Pull the raw incident from ServiceNow and identify the gaps:
- Missing: affected device? Affected service? Symptom details?
- Conflicting: multiple possible root causes with different response paths?
- Stale: ticket opened hours ago with no updates — still valid?

### Phase 2: Route to Human Triage

```
task_type: "incident_triage"
payload:
  title:       "Triage: ambiguous network incident ticket"
  description: |
    An incident ticket has insufficient detail for automated handling.
    Ticket summary: [short_description from ServiceNow]
    Known facts:
      - Reported by: [caller_id role/team, no PII]
      - Opened: [timestamp]
      - Impact field: [value]
      - Description: [full description, sanitized]
    Missing or conflicting:
      - [list gaps]
    Please determine: severity (P1/P2/P3/P4), affected CI(s) if identifiable,
    and recommended response team (NOC/NetEng/Security/Application).
output_schema:
  type: object
  required: [severity, team, notes]
  properties:
    severity: {type: string, enum: [P1, P2, P3, P4]}
    team:     {type: string, enum: [NOC, NetEng, Security, Application, Unknown]}
    notes:    {type: string, minLength: 10}
risk_tier:     "medium"
sla_seconds:   600
payout_currency: "USD"
payout_max_amount: 0.50
```

### Phase 3: Apply Triage Output

Use the verified output to:
1. Update the ServiceNow incident severity and assignment group
2. Post a Slack alert at the appropriate level (`#netclaw-alerts` for P1/P2)
3. Trigger the relevant response workflow

---

## Escalation Summary Report Template

Use this when closing a HumanRail-escalated session:

```
HUMANRAIL ESCALATION SUMMARY
==============================
Task ID:       [task_id]
Task Type:     [task_type]
Opened:        [ISO 8601 timestamp]
Resolved:      [ISO 8601 timestamp]
Worker SLA:    [sla_seconds]s / actual: [elapsed]s

Question Summary:
  [One-paragraph neutral description of what was asked]

Human Decision:
  [Paste result["output"] here]

Action Taken:
  [What the agent did based on the decision]

Outcome:
  [ ] Operation completed successfully
  [ ] Operation deferred — scheduled for [date/window]
  [ ] Escalated to [team] — ServiceNow [INC/CHG number]
```

---

## Checklist

Use before closing any HumanRail-escalated action:

- [ ] Task created with neutral, non-internal-leaking payload
- [ ] `status == "verified"` confirmed before acting (not just "submitted")
- [ ] Human decision applied exactly — no agent interpretation of ambiguous output
- [ ] Slack status posted during wait period
- [ ] ServiceNow CR updated with escalation record (if change-related)
- [ ] GAIT audit trail records the task ID, human decision, and action taken
- [ ] If task expired/failed: conservative fallback applied and INC created

---

## Integration with Other Skills

| Skill | Integration |
|-------|-------------|
| **servicenow-change-workflow** | HumanRail approval is a second-layer gate — ServiceNow CR must still be in `Implement` state; HumanRail output is added as a work note |
| **ise-incident-response** | Use HumanRail for low-confidence risk assessments before the `*** HUMAN DECISION POINT ***` gate when the on-call engineer is not immediately available |
| **slack-incident-workflow** | Post a Slack update when a HumanRail task is created; post result when resolved |
| **gait-session-tracking** | Always record HumanRail task IDs, decisions, and resulting actions in the GAIT audit trail |
| **pyats-config-mgmt** | Use pre-destructive approval workflow before executing high-risk config pushes |

---

## Environment Variables

| Variable | Required | Example | Description |
|----------|----------|---------|-------------|
| `HUMANRAIL_API_KEY` | Yes | `ek_live_abc123...` | API key from [humanrail.dev](https://humanrail.dev) |
| `HUMANRAIL_MCP_URL` | Yes | `http://127.0.0.1:8100/mcp` | MCP endpoint (local server or `https://humanrail.dev/mcp`) |
| `HUMANRAIL_MCP_SCRIPT` | No | `/path/to/server.py` | Path to local server.py (set by install.sh; not needed if using public endpoint) |
| `HUMANRAIL_BASE_URL` | No | `https://api.humanrail.dev/v1` | REST API base URL (default: `https://api.humanrail.dev/v1`) |

More from automateyournetwork/netclaw

Skill	Description
aap-automation	Red Hat Ansible Automation Platform — inventory management, job template execution, project SCM sync, ad-hoc commands, host management, Galaxy content discovery. Use when automating infrastructure with Ansible, running playbooks, managing inventories, or searching for Ansible collections and roles.
aap-eda	Event-Driven Ansible (EDA) — activation lifecycle, rulebook management, decision environments, event stream monitoring. Use when managing event-driven automation triggers, enabling/disabling activations, or reviewing EDA rulebooks.
aap-lint	ansible-lint playbook and role validation — syntax checking, best practice enforcement, project-wide analysis, rule filtering. Use when validating Ansible playbooks, checking code quality, or enforcing automation best practices before deployment.
aci-change-deploy	Safe ACI policy change deployment - ServiceNow CR lifecycle, pre/post-change fault baselines, APIC policy application, automatic rollback on fault delta, and GAIT audit trail. Use when deploying ACI policy changes, creating tenants or EPGs, pushing config to APIC, or running a change window with rollback protection.
aci-fabric-audit	Comprehensive Cisco ACI fabric health audit - node status, tenant/VRF/BD/EPG policy review, contract analysis, fault triage, and endpoint learning verification. Use when auditing ACI fabric health, checking for faults, reviewing tenant policies, or running pre/post-change baselines on APIC.
arista-cvp	Arista CloudVision Portal (CVP) automation via REST API — device inventory, events, connectivity monitoring, tag management (4 tools). Use when managing Arista devices, checking CloudVision events, monitoring network connectivity probes, or tagging devices in CVP.
aruba-cx-config	View and manage Aruba CX switch configurations, perform ISSU upgrades, and firmware operations
aruba-cx-interfaces	Monitor Aruba CX switch interface status, LLDP neighbors, and optical transceiver health
aruba-cx-switching	View and manage Aruba CX switch VLANs and MAC address tables for Layer 2 operations
aruba-cx-system	Discover Aruba CX switch system information, firmware versions, and VSF topology