enterprise-agent-ops
$
npx mdskill add affaan-m/ECC/enterprise-agent-opsOperate long-lived agent systems with lifecycle, security, and observability controls
- Manage runtime lifecycle and safety for continuously running agent workloads
- Integrates with PM2, systemd, container orchestrators, and CI/CD systems
- Enforces least-privilege access and tracks metrics for failure analysis
- Provides audit logs, rollback capabilities, and gradual recovery from incidents
SKILL.md
.github/skills/enterprise-agent-opsView on GitHub ↗
--- name: enterprise-agent-ops description: Operate long-lived agent workloads with observability, security boundaries, and lifecycle management. origin: ECC --- # Enterprise Agent Ops Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions. ## Operational Domains 1. runtime lifecycle (start, pause, stop, restart) 2. observability (logs, metrics, traces) 3. safety controls (scopes, permissions, kill switches) 4. change management (rollout, rollback, audit) ## Baseline Controls - immutable deployment artifacts - least-privilege credentials - environment-level secret injection - hard timeout and retry budgets - audit log for high-risk actions ## Metrics to Track - success rate - mean retries per task - time to recovery - cost per successful task - failure class distribution ## Incident Pattern When failure spikes: 1. freeze new rollout 2. capture representative traces 3. isolate failing route 4. patch with smallest safe change 5. run regression + security checks 6. resume gradually ## Deployment Integrations This skill pairs with: - PM2 workflows - systemd services - container orchestrators - CI/CD gates
More from affaan-m/ECC
- accessibilityDesign, implement, and audit inclusive digital products using WCAG 2.2 Level AA
- agent-architecture-auditFull-stack diagnostic for agent and LLM applications. Audits the 12-layer agent stack for wrapper regression, memory pollution, tool discipline failures, hidden repair loops, and rendering corruption. Produces severity-ranked findings with code-first fixes. Essential for developers building agent applications, autonomous loops, or any LLM-powered feature.
- agent-evalHead-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
- agent-harness-constructionDesign and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.
- agent-introspection-debuggingStructured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
- agent-payment-x402Add x402 payment execution to AI agents with per-task budgets, spending controls, and non-custodial wallets. Supports Base through agentwallet-sdk and X Layer through OKX Payments / OKX Agent Payments Protocol.
- agent-sortBuild an evidence-backed ECC install plan for a specific repo by sorting skills, commands, rules, hooks, and extras into DAILY vs LIBRARY buckets using parallel repo-aware review passes. Use when ECC should be trimmed to what a project actually needs instead of loading the full bundle.
- agentic-engineeringOperate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.
- agentic-osBuild persistent multi-agent operating systems on Claude Code. Covers kernel architecture, specialist agents, slash commands, file-based memory, scheduled automation, and state management without external databases.
- ai-first-engineeringEngineering operating model for teams where AI agents generate a large share of implementation output.