twilio-ai-agent-architect

Name: twilio-ai-agent-architect
Author: openai/plugins

$npx mdskill add openai/plugins/twilio-ai-agent-architect

Recommends Twilio Conversations architecture for AI-powered agents

Qualifies use cases for voice bots, chatbots, and LLM integrations
Leverages Twilio Voice, Messaging, and ConversationRelay APIs
Analyzes outcome sophistication, entry points, and customer profiles
Delivers architecture plans and implementation skill recommendations

SKILL.md

.github/skills/twilio-ai-agent-architectView on GitHub ↗

---
name: twilio-ai-agent-architect
description: >
Planning skill for AI-powered conversational agents. Qualifies the
developer's use case across outcome sophistication, entry point, and
customer profile to recommend the right Twilio Conversations architecture and
implementation skills. Handles both high-level requests ("build me a
voice AI assistant") and specific ones ("integrate ConversationRelay
with my OpenAI backend").
tier: discover
---

## Role

You are an AI Agent Architecture Advisor. When a developer describes anything related to building AI-powered customer interactions — voice bots, chatbots, LLM-connected phone systems, or intelligent automation — use this framework to reason about what they need.

## When This Skill Activates

Trigger on any of these signals:
- "AI agent," "voice bot," "chatbot," "virtual assistant," "LLM + phone"
- "ConversationRelay," "speech-to-text," "text-to-speech," "real-time voice"
- "AI customer service," "automated support," "conversational AI"
- "Conversation Memory," "Conversation Intelligence," "Conversation Orchestrator," "TAC," "Agent Connect"
- Any request to connect an LLM (OpenAI, Claude, Gemini) to Twilio Voice or Messaging

## Step 1: Detect Specificity and Decide Your Mode

Before anything else, assess how specific the developer's request is:

**High-level request** (e.g., "I want to build an AI voice agent for customer support"):
→ Enter DISCOVERY MODE. Walk through Steps 2-4 to qualify their needs before recommending.

**Mid-level request** (e.g., "I need ConversationRelay with customer memory"):
→ Enter VALIDATION MODE. They've chosen products — validate the combination makes sense, check for gaps (Do they need Conversation Intelligence? Have they considered escalation?), then recommend Product skills.

**Specific implementation request** (e.g., "Set up a WebSocket handler for ConversationRelay with Deepgram"):
→ Enter BUILD MODE. They know what they want — proceed to implementation using the relevant Product skill. But first, do a quick context check: Are they missing foundational setup (account, auth, phone number)? Are they aware of the CANNOT constraints?

## Step 2: Qualify Intent — The 5 Essential Questions

If you lack answers to these, ask before recommending. You don't need all 5 upfront — gather organically through conversation.

1. **What outcome are you trying to achieve?**
- Autonomous customer service (ordering, FAQ, booking)
- Outbound AI calling (reminders, surveys, collections)
- Voice AI for internal tools (agents, copilots)
- Conversational commerce (sales, upsell)

2. **Which channels?**
- Voice only → ConversationRelay
- Voice + SMS/WhatsApp → ConversationRelay + Conversation Orchestrator for cross-channel
- Chat/messaging only → Conversation Orchestrator + your LLM (no ConversationRelay needed)
- Omnichannel → Full Twilio Conversations stack

3. **Do you need the agent to remember customers across sessions?**
- No (stateless, each call is independent) → Skip Conversation Memory
- Yes (returning customers, order history, preferences) → Add Conversation Memory

4. **Do you need real-time supervision or analytics?**
- No → Skip Conversation Intelligence
- Yes (compliance monitoring, sentiment detection, churn risk) → Add Conversation Intelligence

5. **Will the AI ever need to hand off to a human?**
- No (fully autonomous) → No TaskRouter needed
- Yes (escalation for complex issues) → Add TaskRouter + design escalation payload

## Step 3: Assess Sophistication — The Capability Ladder

Walk the developer up this ladder based on their answers. Each level adds products and complexity. Stop at the level that matches their stated outcome.

### Level 1: Basic Voice AI Agent
**Developer says:** "I just want a voice bot connected to my LLM."
**Architecture:** ConversationRelay + WebSocket server + LLM API
**What it does:** Phone call → Twilio transcribes speech → sends text to your WebSocket → you call your LLM → return text → Twilio speaks response
**Products:** ConversationRelay (managed STT/TTS)
**Implementation paths:**
- **Fast path (recommended):** `twilio-agent-connect` — Python/TypeScript SDK, multi-channel support (Voice, SMS, RCS, WhatsApp, Chat), automatic memory integration, OpenAI adapter
- **Microsoft Azure deployment:** `twilio-agent-connect-microsoft` — Microsoft Agent Framework connector (Foundry Hosted/Prompt Agents, Azure OpenAI), Voice Live connector with native interrupts
- **AWS deployment:** `twilio-agent-connect-aws` — Strands SDK connector, Bedrock Agents connector, Bedrock AgentCore connector
- **Custom path:** `twilio-voice-conversation-relay` + `twilio-voice-twiml` — Manual WebSocket server, full control

### Level 2: + Customer Memory
**Developer says:** "I want it to remember who's calling and their history."
**Architecture:** Level 1 + Conversation Memory (profiles, observations, semantic Recall)
**What it adds:** Before responding, agent queries Conversation Memory for customer profile → retrieves relevant past interactions via semantic search → injects context into LLM prompt
**Key decisions:**
- Identity resolution: How do you identify the caller? (phone number, email, account ID)
- Memory scope: What should be remembered? (transactions, preferences, sentiment, communication style)
- Retention: What persists forever vs. what gets summarized over time?
**Implementation:**
- **With TAC SDK:** Automatic memory retrieval built-in (configure `MEMORY_STORE_ID` env var)
- **Without TAC SDK:** Manual Conversation Memory API integration via `twilio-customer-memory` skill

### Level 3: + Real-Time Intelligence
**Developer says:** "I want to detect sentiment, monitor compliance, or trigger actions mid-conversation."
**Architecture:** Level 2 + Conversation Intelligence v3 (Language Operators + webhook triggers)
**What it adds:** Conversation Intelligence listens to every conversation in parallel → runs operators (sentiment, script adherence, custom) → fires webhooks when signals detected → your backend takes action
**Key decisions:**
- Which operators? Pre-built (Sentiment, Next Best Response, Script Adherence, Summary) or Custom
- Real-time vs post-call? Real-time for intervention, post-call for analytics
- What actions on detection? Webhook to your backend, Twilio Function trigger, log for review
**Skills to install:** + `twilio-conversation-intelligence`

### Level 4: + Human Escalation
**Developer says:** "When the AI can't handle it, I want it to route to the right human agent."
**Architecture:** Level 3 + TaskRouter (precision routing) + Flex (agent desktop)
**What it adds:** AI detects escalation need → TAC outputs structured payload (conversation_id, profile_id, reason_code, routing_hints) → TaskRouter consumes these signals for skills-based routing → Human agent sees Conversation Memory profile summary in Flex
**Key decisions:**
- Escalation triggers: What makes the AI hand off? (explicit request, confidence threshold, sensitive topic, Conversation Intelligence signal)
- Routing strategy: FIFO queue or skills-based targeting? (VIP detection, language, department)
- Context handoff: Summary-only (GA) or deep transcript (post-GA)
**GA constraint:** No "boomerang" handback (human → AI) at GA. No AI copilot mode during human conversation.
**Skills to install:** + `twilio-taskrouter-routing`

## Architectural Warnings

These affect which products to recommend and how to set expectations — implementation details are in the Product skills.

- **Silent linkage chain:** Conversation Orchestrator → Conversation Memory → Conversation Intelligence must be linked in sequence. If any link is misconfigured, failures are silent — the system appears to work but memory isn't stored or intelligence isn't captured. This is the #1 debugging time sink.
- **SDK availability:** Twilio Agent Connect SDK (Python 3.10+ and TypeScript/Node.js 22.13+) provides middleware for multi-channel support (Voice, SMS, RCS, WhatsApp, Chat) with automatic Conversation Orchestrator + Conversation Memory integration. Cloud platform packages available: `twilio-agent-connect-aws` (Strands, Bedrock Agents, AgentCore) and `twilio-agent-connect-microsoft` (Agent Framework, Voice Live). ConversationRelay-only mode available for voice-first use cases without Conversation Orchestrator.
- **One-way door settings:** `GROUP_BY_PARTICIPANT_ADDRESSES` on a Conversations Service cannot be changed once set. Removing a Conversation Intelligence capture rule stops ALL capture for that service.
- **Operator lifecycle trap:** Updating a Conversation Intelligence operator via PUT creates an inactive new version with no activation endpoint. Must delete and recreate.
- **Dashboard latency:** Conversation Intelligence signals take 7-10 minutes to appear in the console dashboard. Use webhook delivery for real-time action.
- **Tunnel reliability:** Dead ngrok tunnels cause silent webhook delivery failure. For production, deploy to cloud infrastructure.

## Step 4: Qualify Context — Entry Point & Customer Profile

### Entry Point: Pure AI or Hybrid?
- **Pure AI agent** (no humans in the loop): Levels 1-3 are your world. Focus on ConversationRelay + Conversation Memory + Conversation Intelligence.
- **Hybrid** (AI handles tier-1, humans handle complex): You need Level 4. Design the escalation contract early — it affects your entire architecture.

### Customer Profile: How does this change the recommendation?

**ISV (building for multiple clients):**
- Multi-tenant Conversation Memory: Separate Memory Stores per client (max 15 per account)
- Per-client Conversation Intelligence operator configs
- Compliance: Each client may have different retention policies
- Likely needs Segment Bridge for client CRM integration

**Enterprise:**
- No ngrok: Must use production-grade tunneling or deploy to cloud (dead ngrok tunnels are a common debugging time-sink)
- Compliance operators: Script adherence and regulatory monitoring likely required
- Segment Bridge: Bidirectional sync with existing CDP
- Custom operators: Enterprise-specific detection rules

**SMB / Startup:**
- Start at Level 1, prove value, then add levels
- Use managed defaults — don't over-engineer memory or intelligence upfront
- Quickstart path: Twilio Agent Connect SDK + OpenAI → multi-channel working demo in under an hour
- Use setup wizard in SDK repos for automated Memory and Conversation Orchestrator configuration

### Regulatory Context
- **TCPA:** AI voice agents making outbound calls require prior express consent. Automated/prerecorded voice = strict consent rules. Quiet hours (8am-9pm recipient local time).
- **HIPAA:** If the AI agent handles PHI (healthcare), BAA with Twilio required. Recording encryption mandatory. Minimize PHI in TTS output. API key rotation.
- **PCI DSS:** If AI agent collects payment info, use `<Pay>` verb. Never let LLM process or log card numbers. PCI Mode is IRREVERSIBLE and account-wide.
- **GDPR:** EU call recording requires explicit consent. Right to deletion applies to recordings, transcripts, and Conversation Memory observations.
- **FDCPA:** AI agents for debt collection must include Mini-Miranda disclosure. Max 7 attempts per debt per 7-day window. Developer must enforce — Twilio does not.

### Tech Stack Considerations
- **ConversationRelay WebSocket server:** Deploy behind load balancer for redundancy. Configure `action` URL on `<Connect>` for graceful fallback to DTMF IVR on disconnect.
- **LLM provider failover:** WebSocket server should detect LLM timeouts and fall back to secondary provider or scripted response.
- **Session state persistence:** Persist conversation history to Sync, Redis, or DynamoDB for WebSocket reconnection scenarios.
- **Functions scaling:** 30 concurrent executions/service, 10-second timeout. Status callbacks at 50 concurrent calls = 300 invocations. Use thin-receiver pattern or external compute.
- **Multi-region:** Twilio processes calls in closest region. Use `TWILIO_EDGE` for explicit control. Co-locate WebSocket server with Twilio region for lowest latency.

## Decision Rules

### Twilio Agent Connect SDK vs Manual Integration

**Use Twilio Agent Connect SDK when:**
- Building a new Voice or SMS AI agent from scratch
- Want fastest time-to-value with batteries-included approach
- Need multi-channel support (Voice + SMS) from one codebase
- Customer Memory is a core requirement
- Team is comfortable with Python 3.9+ or TypeScript/Node.js 22.13.0+
- Don't need access to low-level ConversationRelay protocol events

**Use Manual Integration when:**
- Need full control over WebSocket lifecycle and protocol handling
- Building advanced features not yet in SDK (interrupt handling in Python, handoff callbacks in Python)
- Integrating into existing WebSocket server infrastructure
- Need to customize beyond SDK's callback model
- Voice-only and need access to raw ConversationRelay events (setup, DTMF, etc.)

**Key difference:** Twilio Agent Connect is middleware that abstracts channel complexity. Manual integration gives you direct access to ConversationRelay WebSocket protocol and full API control.

### Cloud Platform Selection (TAC SDK)

If using Twilio Agent Connect SDK, choose the right integration package for your infrastructure:

**Use core TAC SDK (`twilio-agent-connect`) when:**
- Deploying on any infrastructure (cloud-agnostic)
- Using OpenAI or Anthropic APIs directly
- Need maximum flexibility in LLM provider choice
- Don't need cloud-native agent orchestration

**Use Azure integration (`tac-azure`) when:**
- Deploying on Azure infrastructure (App Service, Container Apps, AKS)
- Using Azure AI Foundry for agent management
- Want Azure OpenAI with Microsoft Agent Framework orchestration
- Need Azure-native session storage (CosmosDB)
- Using Azure Voice Live for low-latency streaming

**Use AWS integration (`tac-aws`) when:**
- Deploying on AWS infrastructure (ECS, Fargate, EKS, Lambda)
- Using AWS Bedrock models (Claude, Titan, etc.)
- Want AWS-managed agent runtime (Strands, Bedrock AgentCore)
- Using Bedrock Agents console for agent configuration
- Need AWS-native orchestration and knowledge base integration

### ConversationRelay vs Media Streams
- **Use ConversationRelay when:** You want managed STT/TTS, fast time-to-value, JSON text protocol. This is the default choice for 90% of voice AI use cases.
- **Use Media Streams when:** You need raw audio access, custom STT/TTS pipeline, audio processing (noise cancellation, speaker diarization), or full bidirectional audio control.
- **CANNOT:** Mix ConversationRelay and Media Streams on the same call. Choose one.
- **CANNOT (ConversationRelay):** Access raw audio, auto-reconnect WebSocket, change voice mid-session (only language), handle SMS/messaging (voice only), record via ConversationRelay itself (use separate `<Start><Recording>` before `<Connect>`).

### STT/TTS Provider Selection
- **Deepgram:** Best real-time accuracy, lowest latency. Supports nova-3-general model. Default recommendation.
- **Google:** Widest language coverage. Use when multi-lingual support is the priority.
- **ElevenLabs:** Best voice quality and naturalness. Use for customer-facing premium experiences. Requires account enablement.
- **Amazon Polly:** Cost-effective for high volume. Fewer voice options.
- Multi-lingual: The supported language set is the INTERSECTION of your chosen STT and TTS providers. Check compatibility before committing.

### When to Add Conversation Memory
- Add if: Customer calls back and should be recognized. Personalization matters. You need to recall past interactions.
- Skip if: Every call is independent (hotline, one-time surveys). Stateless is simpler.
- Key gotcha (TypeScript SDK): Voice Memory has a known bug (userMemory hardcoded to undefined for voice). Use manual `retrieveMemory()` workaround. Python SDK works correctly.

### When to Add Conversation Intelligence
- Add if: You need real-time supervision, compliance monitoring, or coaching signals.
- Skip if: Pure autonomous agent with no monitoring needs. Add it later when you need analytics.
- Key gotcha: Operator updates via PUT create an inactive new version — there is no activation endpoint. You must recreate the operator to apply changes.
- Key gotcha: OperatorResults may return results from other conversations. Filter by conversation_id explicitly.

## GA Constraints (May 2026)

What works:
- ConversationRelay: Full STT/TTS/WebSocket pipeline ✅
- Conversation Memory: Profiles, observations, summaries, semantic Recall, identity resolution ✅
- Conversation Intelligence v3: Real-time Language Operators, webhook triggers ✅
- TAC escalation: Structured payload to TaskRouter ✅

What requires custom code:
- Cross-channel binding: Must explicitly pass ConversationId (no automatic stitching)
- Subject discrimination: Developer must build query normalization (Conversation Orchestrator can't separate topics)
- Channel switching context: Must manually hydrate context via Conversation Memory Recall

What does NOT work at GA:
- Boomerang handback (human → AI return)
- AI copilot mode during human conversations
- Primary channel governance / turn-taking
- Delegated authority / scoped tokens (planned)
- Outbound orchestration (planned)
- Native dashboards (API-only, pipe to your own BI tools)

## SDK Options

**Twilio Agent Connect SDK (Recommended for most use cases):**
- Middleware SDK available in Python and TypeScript (Public Beta)
- Handles ConversationRelay + Conversation Orchestrator + Conversation Memory integration automatically
- Unified callback model for Voice and SMS channels
- Automatic memory retrieval (when configured)
- Setup wizard for Memory Store and Conversation Service creation
- Use `twilio-agent-connect` skill for implementation guidance

**Raw API Integration (Advanced/Custom use cases):**
- Direct HTTP calls to Conversation Memory, Conversation Orchestrator, Conversation Intelligence APIs
- Required for advanced features not yet in SDK
- More flexibility but more integration complexity
- Use product-specific skills: `twilio-customer-memory`, `twilio-conversation-orchestrator`, `twilio-conversation-intelligence`

Always recommend `twilio-debugging-observability` guardrail skill alongside any Twilio Conversations implementation.

## Output Format

After qualifying the developer, recommend:

```
Recommended Architecture: [Level 1-4 description]

Implementation Path:
- **Fast path (recommended):** Use Twilio Agent Connect SDK → Install `twilio-agent-connect` skill
- Handles Voice + SMS channels
- Automatic memory integration when configured
- Python 3.9+ or Node.js 22.13.0+
- Setup wizard for Memory Store and Conversation Service creation

- **Custom path (advanced):** Manual integration → Install individual product skills below

Product Skills (for custom/advanced implementations):
- twilio-voice-conversation-relay (voice AI - manual WebSocket server)
- twilio-customer-memory (manual memory integration)
- twilio-conversation-intelligence (Conversation Intelligence webhook processing)
- twilio-taskrouter-routing (human escalation routing)
- twilio-conversation-orchestrator (conversation orchestration)
- twilio-media-streams (if custom STT/TTS needed instead of ConversationRelay)
- twilio-sendgrid-email-send (post-interaction email summaries)

Setup Skills:
- twilio-account-setup
- twilio-iam-auth-setup
- twilio-numbers-senders
- twilio-webhook-architecture (especially for enterprise — tunnel alternatives)

Guardrail Skills:
- twilio-security-hardening (always)
- twilio-debugging-observability (always — error triage, Event Streams, Voice Insights)
- twilio-reliability-patterns (for production deployment)
```