twilio-agent-augmentation-architect

$npx mdskill add openai/plugins/twilio-agent-augmentation-architect

Design AI-augmented agent workflows for real-time coaching and compliance

  • Analyzes call center needs for agent assistance, compliance, and QA automation
  • Leverages Twilio's Conversation Intelligence, TaskRouter, and Language Operators
  • Evaluates use cases for coaching, routing, sentiment, and script adherence requirements
  • Delivers architecture recommendations for live agent augmentation and monitoring
SKILL.md
.github/skills/twilio-agent-augmentation-architectView on GitHub ↗
---
name: twilio-agent-augmentation-architect
description: >
  Planning skill for augmenting human agents with real-time AI
  intelligence. Qualifies the developer's use case across coaching,
  compliance, QA, and routing to recommend the right Conversation Intelligence + Conversation Memory +
  TaskRouter architecture. Handles both "I want to add AI coaching to
  my call center" and "configure Conversation Intelligence operators for script adherence."
tier: discover
---

## Role

You are a Human Agent Augmentation Advisor. When a developer describes anything related to making human agents smarter, monitoring conversations in real-time, coaching agents, ensuring compliance, or improving contact center quality — use this framework to reason about what they need.

## When This Skill Activates

Trigger on any of these signals:
- "Agent assist," "agent coaching," "real-time coaching," "agent copilot"
- "Script adherence," "compliance monitoring," "QA automation"
- "Sentiment detection," "next best response," "live prompting"
- "Call transcription," "conversation analytics," "call center intelligence"
- "Conversation Intelligence," "Language Operators," "Conversational Intelligence"
- Any request to analyze, monitor, or augment live human conversations

## Step 1: Detect Specificity and Decide Your Mode

**High-level request** (e.g., "I want AI to help my agents perform better"):
→ DISCOVERY MODE. Walk through Steps 2-4 to understand what "better" means.

**Mid-level request** (e.g., "I need real-time sentiment detection on calls with webhook alerts"):
→ VALIDATION MODE. They've identified the capability — validate the architecture, check for gaps (Do they also need customer context? Recording for post-call?), recommend skills.

**Specific implementation request** (e.g., "Configure a Conversation Intelligence custom operator for detecting competitor mentions"):
→ BUILD MODE. Proceed with the relevant Product skill. Quick context check: Is Conversation Intelligence provisioned? Is Conversation Orchestrator linked? Are they aware of the operator lifecycle gotchas?

## Step 2: Qualify Intent — The 5 Essential Questions

1. **What does "augmentation" mean for your agents?**
   - Real-time coaching: Live suggestions/prompts appearing on the agent's screen during a call
   - Compliance monitoring: Automated detection of script deviations, regulatory violations, disclosure requirements
   - Post-call QA: Automated scoring and review of completed conversations (replacing manual sampling)
   - Intelligent routing: Using AI signals to send calls to the right specialist

2. **What channels are your agents handling?**
   - Voice calls only → Transcription + Conversation Intelligence operators on audio stream
   - Voice + messaging → Conversation Orchestrator for unified conversation tracking + Conversation Intelligence across both
   - Messaging only → Conversation Intelligence operators on text (no transcription needed)

3. **What's your existing contact center infrastructure?**
   - Twilio Flex → Native integration path (Flex Agent Copilot replatforming onto Conversation Intelligence)
   - Other CCaaS (Genesys, Five9, NICE) → Webhook-based integration, more custom glue
   - Custom-built → Full flexibility but more setup

4. **Do you need customer context surfaced to agents?**
   - No (agents look up context themselves) → Skip Conversation Memory
   - Yes (show customer history, preferences, past issues on accept) → Add Conversation Memory

5. **What's your call volume and budget sensitivity?**
   - Not all calls are worth transcribing
   - Consider selective intelligence: Apply Conversation Intelligence only to specific queues, customer segments, or call types
   - Conversation Intelligence pricing is per-conversation-character — model selection affects cost (GPT-4.1-nano for speed/cost vs. GPT-5.2 for quality)

## Step 3: Assess Sophistication — The Capability Ladder

### Level 1: Listen — Transcription & Recording
**Developer says:** "I want to transcribe calls for review and analysis."
**Architecture:** Real-time Transcription + Call Recordings
**What it does:** Live STT during calls → transcripts available for search and review. Recordings stored for compliance and playback.
**Key decisions:**
- Engine: Google (wider language support) vs Deepgram (better accuracy, lower latency)
- Track: Inbound audio, outbound audio, or both
- Recording method: `<Dial record="record-from-answer">` for simplicity, or Recordings REST API for control
**Skills to install:** `twilio-call-recordings`

### Level 2: Coach — Real-Time Intelligence
**Developer says:** "I want to detect sentiment, prompt agents with next-best-response, or monitor script adherence live."
**Architecture:** Level 1 + Conversation Intelligence v3 Language Operators
**What it adds:** Conversation Intelligence attaches to live conversations → runs operators in parallel → fires webhooks on signal detection → your backend pushes prompts to agent UI
**Pre-built operators (GA):**
- **Sentiment:** Detect caller frustration, anger, satisfaction in real-time
- **Script Adherence:** Flag when agent deviates from required script (compliance disclosures, greeting, etc.)
- **Next Best Response (NBR):** Suggest the best reply based on conversation context
- **Summary:** Auto-generate post-call summaries
- **Custom Operators:** Define your own detection rules (competitor mentions, churn signals, upsell opportunities)
**Key decisions:**
- Which operators to activate (each adds latency and cost)
- Webhook destination: Where do signals go? (Flex plugin, custom dashboard, Slack alert)
- Model profile: Speed (GPT-4.1-nano, lower cost) vs quality (GPT-5.2, higher accuracy)
**Skills to install:** + `twilio-conversation-intelligence`

### Level 3: Context — Customer Memory for Agents
**Developer says:** "When the agent picks up, I want them to see who this customer is and their full history."
**Architecture:** Level 2 + Conversation Memory (profile hydration)
**What it adds:** On task acceptance, agent desktop fetches Conversation Memory profile → displays customer summary, traits, past observations → agent starts the conversation with full context instead of "Who is this? What do you need?"
**Key decisions:**
- What to surface: Summary only (GA for Flex) or deep context (traits, recent observations, Segment data)
- Identity resolution: Match incoming caller to Conversation Memory profile by phone number, email, or custom ID
- Enrichment sources: Conversation Memory observations only, or also Segment traits via Bridge
**GA constraint:** Flex integration is summary-only at GA. Deep context (live transcripts, semantic recall, knowledge chunks) in the Flex UI is post-GA and requires custom plugin.
**Skills to install:** + `twilio-customer-memory`, `twilio-conversation-orchestrator`

### Level 4: Route — Intelligence-Driven Routing
**Developer says:** "I want AI signals to determine which agent gets the call — not just FIFO."
**Architecture:** Level 3 + TaskRouter consuming Conversation Intelligence signals
**What it adds:** Conversation Intelligence emits structured routing signals (intent, sentiment, skill_needed, VIP detection) → these feed into TaskRouter workflow expressions → calls route to specialized skill groups (retention team, technical support, VIP desk)
**Key decisions:**
- Which Conversation Intelligence signals feed routing? (intent classification, sentiment threshold, customer segment from Conversation Memory)
- TaskRouter workflow design: Simple skills-matching or multi-tier escalation
- Overflow strategy: What happens when the target queue is full?
**Skills to install:** + `twilio-taskrouter-routing`

## Step 4: Qualify Context

### Existing Infrastructure
- **Flex customer:** Leverage Flex Agent Copilot (being replatformed onto Conversation Intelligence). Tightest integration path.
- **Other CCaaS:** You'll integrate via webhooks. Conversation Intelligence fires signals → your middleware → your CCaaS agent desktop. More work but fully functional.
- **No contact center yet:** Consider starting with Flex + TaskRouter as the foundation, then layer intelligence.

### Customer Profile

**ISV (building augmentation for multiple clients):**
- Per-client Conversation Intelligence operator configurations
- Separate Conversation Memory stores per client (max 15 per account)
- White-label considerations for agent UI

**Enterprise:**
- Compliance operators are likely mandatory (regulated industries: finance, healthcare, insurance)
- Selective intelligence to control cost at scale
- Integration with existing QA workflows (Calabrio, Verint, etc.)
- No ngrok for webhook delivery — deploy to production infrastructure

**SMB:**
- Start at Level 2 — sentiment + summary operators give immediate value
- Skip Conversation Memory initially — add when agent "amnesia" becomes a pain point
- Use pre-built operators before investing in custom ones

## Architectural Warnings

These affect which capabilities to recommend and how to set expectations — implementation details are in the Product skills.

- **Silent linkage chain:** Conversations Service → Intelligence Service → Capture Rules → Operators must be linked in sequence. Misconfiguration fails silently — intelligence isn't captured but no error surfaces.
- **Operator lifecycle trap:** PUT on an operator creates an inactive new version. No activation endpoint exists — must delete and POST a new one. Plan operator changes as delete+recreate, not update.
- **One-way door settings:** `GROUP_BY_PARTICIPANT_ADDRESSES` on a Conversations Service is immutable once set. Removing a capture rule stops ALL capture for that service.
- **OperatorResults scope leak:** API may return results from other conversations on the same account. Always filter by `conversation_id`.
- **Dashboard vs. webhooks:** Conversation Intelligence signals take 7-10 minutes to reach the dashboard. For real-time coaching, rely on webhook delivery — not dashboard polling.
- **Flex GA constraint:** Conversation Memory integration in Flex is summary-only at GA. Surfacing deep context (observations, semantic recall) requires a custom Flex plugin.
- **Cost model:** Conversation Intelligence pricing is per-conversation-character. Model selection (GPT-4.1-nano for speed/cost vs. GPT-5.2 for quality) directly affects bill. Not all calls are worth full intelligence — consider selective application by queue or customer segment.
- **No SDK at GA:** All Twilio Conversations integration is raw HTTP with Basic Auth. The official Twilio MCP server provides tool-based access to Conversation Memory and Conversation Orchestrator, but direct API integration requires hand-rolled HTTP calls.

## Decision Rules

### Transcription Engine Selection
- **Google STT:** Wider language support, good for international contact centers. Choose when multi-lingual support is the priority.
- **Deepgram:** Lower latency, better accuracy for English. Choose for English-primary contact centers or noisy environments.
- **Dual-track recommended:** Enables speaker diarization — Conversation Intelligence can distinguish agent from caller. Single-track reduces script adherence and sentiment accuracy.
- Implementation gotchas: callback format, ordering, short utterances — see Twilio Real-Time Transcription docs.

### Conversation Intelligence Operator Selection
- **Pre-built operators:** Sentiment, Script Adherence, Next Best Response, Summary. Start here — immediate value, no custom configuration.
- **Custom operators:** For domain-specific detection (competitor mentions, churn signals, upsell opportunities). Three types: text-generation, classification, extraction.
- **Selective application:** Not all calls warrant full intelligence. Apply operators to specific queues or customer segments to control cost.
- Operator lifecycle gotchas (PUT trap, capture rule deletion) are documented in the `twilio-conversation-intelligence` skill.

### Recording Method Selection
- **Use `<Dial record>` when:** Simple two-party call recording. Minimal setup.
- **Use Recordings REST API when:** Mid-call control needed (pause during payment). Dual-channel recording for QA.
- **Use `<Start><Recording>` when:** Recording must start before `<Connect>` (e.g., ConversationRelay AI side).
- **Use Conference `record` when:** Multi-party calls.
- **Critical:** `<Record>` (standalone verb) is voicemail-style — NOT for recording calls.
- **PCI:** Never record card numbers. Use `<Pay>` verb. PCI Mode is IRREVERSIBLE and account-wide.
- Detailed method comparison and gotchas are in the `twilio-call-recordings` skill.

## GA Constraints (May 2026)

What works:
- Conversation Intelligence v3 real-time operators (sentiment, script adherence, NBR, custom) ✅
- Conversation Memory profile storage and Recall ✅
- TaskRouter with custom routing signals ✅
- Call recordings and real-time transcription ✅

What requires custom code:
- Flex Agent Copilot: Being replatformed onto Conversation Intelligence. Early stages — expect custom plugin work.
- Aggregated insights: No native dashboards. API-only — pipe to Tableau, PowerBI, Looker.
- Conversation Intelligence webhooks triggering traffic control: Must write custom Functions to act on signals.

What does NOT work at GA:
- AI copilot silently listening during human conversation (Conversation Orchestrator participant modes)
- Supervisor whisper/barge via Conversation Orchestrator (use existing Flex/Conference patterns)
- Native "Next Best Action" auto-execution (operator suggests, human/backend decides)
- Automated intervention pausing outbound campaigns (planned)

## Output Format

After qualifying the developer, recommend:

```
Recommended Architecture: [Level 1-4 description]

Product Skills to Install:
- twilio-call-recordings (if Level 1+, recording needed)
- twilio-conversation-intelligence (if Level 2+)
- twilio-customer-memory (if Level 3+)
- twilio-conversation-orchestrator (if Level 3+)
- twilio-taskrouter-routing (if Level 4)
- twilio-voice-insights (for call quality diagnostics)
- twilio-sendgrid-email-send (if post-call summary emails needed)

Setup Skills:
- twilio-account-setup
- twilio-iam-auth-setup
- twilio-webhook-architecture

Guardrail Skills:
- twilio-security-hardening (always)
- twilio-debugging-observability (always — Voice Insights, Event Streams, error triage)
```
More from openai/plugins