voice-setup
$
npx mdskill add vellum-ai/vellum-assistant/voice-setupYou are helping the user set up and troubleshoot voice features (push-to-talk, text-to-speech) entirely within this conversation. Do NOT direct the user to the Settings page for initial setup - handle everything in-chat using the tools below.
SKILL.md
.github/skills/voice-setupView on GitHub ↗
---
name: voice-setup
description: Complete voice configuration in chat - PTT key, microphone permissions, ElevenLabs TTS, and troubleshooting
compatibility: "Designed for Vellum personal assistants"
metadata:
icon: assets/icon.svg
emoji: "🎙️"
vellum:
category: "voice"
display-name: "Voice Setup"
includes: ["elevenlabs-voice"]
activation-hints:
- "Guided setup or troubleshooting (walkthrough, PTT not working, mic issues, ElevenLabs/TTS)"
- "Simple voice setting changes (PTT key, wake word) -> use voice_config_update directly"
avoid-when:
- 'If "voice" is in a Twilio/phone context, load phone-calls instead'
---
You are helping the user set up and troubleshoot voice features (push-to-talk, text-to-speech) entirely within this conversation. Do NOT direct the user to the Settings page for initial setup - handle everything in-chat using the tools below.
## Available Tools
- `voice_config_update` - Change any voice setting (PTT key, conversation timeout, TTS voice ID)
- `open_system_settings` - Open macOS System Settings to a specific privacy pane
- `navigate_settings_tab` - Open the Vellum settings panel to the Voice tab
- `credential_store` - Collect API keys securely (for ElevenLabs TTS)
## Setup Flow
Walk the user through each section in order. Skip sections they don't need. Ask before proceeding to the next section.
### 1. Microphone Permission
Check `<channel_capabilities>` for `microphone_permission_granted`.
**If `false` or missing:**
1. Explain that macOS requires microphone permission for voice features.
2. Use `open_system_settings` with `pane: "microphone"` to open the right System Settings pane.
3. Tell the user: "I've opened System Settings to the Microphone section. Please toggle **Vellum Assistant** on, then come back here."
4. After they confirm, verify by checking capabilities on the next turn.
**If `true`:** Tell them microphone is already granted and move on.
### 2. Push-to-Talk Activation Key
Present common PTT key options:
- **Right Option** - Default, good general choice
- **Fn** - Dedicated key on most Mac keyboards
- **Right Command** - Easy to reach
- **Right Control** - Familiar from gaming
Ask which key they prefer, then use `voice_config_update` with `setting: "activation_key"` and the chosen value.
**Common issues to mention:**
- If they pick a key that conflicts with their emoji picker (Fn or Globe on newer Macs), warn them and suggest an alternative.
- If they use a terminal app heavily, warn that some keys may be captured by the terminal.
### 3. Text-to-Speech / ElevenLabs (Optional)
Ask if they want high-quality text-to-speech voices via ElevenLabs (optional - standard TTS works without it).
If yes, the included **ElevenLabs Voice** skill (automatically appended below via `includes`) provides the full setup flow: curated voice list, API key collection, advanced voice selection, and tuning parameters. Follow the instructions there.
Note: The config key `services.tts.providers.elevenlabs.voiceId` controls the voice for both in-app TTS and phone calls. If the user sets up phone calls later, they will automatically use the same voice for a consistent experience.
### 4. Verification
After setup is complete:
1. Summarize what was configured.
2. Suggest they test by pressing their PTT key and speaking.
3. Offer to open the Voice settings tab if they want to review: use `navigate_settings_tab` with `tab: "Voice"`.
## Troubleshooting Decision Trees
When the user reports a problem, follow the appropriate decision tree:
### "PTT isn't working" / "Can't record"
1. **Microphone permission** - Check `microphone_permission_granted` in capabilities. If false, guide through granting it.
2. **Key check** - Ask what key they're using. Confirm it matches their configured PTT key.
3. **Emoji picker conflict** - On newer Macs, Fn/Globe opens the emoji picker. If they're using Fn, suggest switching to Right Option or Right Command.
4. **Speech Recognition permission** - Some voice features need this. Use `open_system_settings` with `pane: "speech_recognition"`.
5. **App focus** - PTT may not work when Vellum is not the frontmost app or if another app has captured the key.
### "Recording but no text" / "Transcription not working"
1. **Speech Recognition permission** - Must be granted for transcription.
2. **Microphone input** - Ask if they see the recording indicator. If yes, the mic works but transcription is failing.
3. **Locale/language** - Speech recognition works best with the system language. Ask if they're speaking in a different language.
4. **Background noise** - Excessive noise can prevent transcription. Suggest a quieter environment or a closer microphone.
### "Changed a setting but it didn't work"
1. **Event broadcast** - The setting should take effect immediately. If it didn't, suggest restarting the assistant.
2. **Verify** - Open the Voice settings tab with `navigate_settings_tab` to confirm the setting was persisted.
## Deep Debugging
For persistent issues, suggest checking system logs:
```bash
log stream --predicate 'subsystem == "com.vellum.assistant"' --level debug
```
Key log categories:
- `voice` - PTT activation, recording state
- `speech` - Speech recognition results
## Rules
- Always handle setup conversationally in-chat. Do NOT tell the user to go to Settings for initial configuration.
- Use `navigate_settings_tab` only for review/verification after in-chat setup, not as the primary setup method.
- Be concise. Don't explain every option exhaustively - present the most common choices and let the user ask for more.
- If a permission is denied, acknowledge it gracefully and explain what features won't work without it.
More from vellum-ai/vellum-assistant
- acpSpawn external coding agents via the Agent Client Protocol (ACP)
- amazonShop on Amazon and Amazon Fresh through your browser
- api-mappingRecord and analyze API surfaces of web services
- app-builderBuild and edit small, personal visual tools and artifacts — dashboards, trackers, calculators, data visualizations, charts, simple landing pages, and slide decks the user wants for THEMSELVES. This is the right skill whenever the user asks to "visualize this," "make a chart," or "build an artifact" for their own use, or to edit an app they already built here. Do NOT reach for a ui_show dynamic_page to fake an artifact — build a real persistent app here. NOT for complex, multi-user, or shippable products — those go to a real project folder with a coding agent (see Scope below).
- app-controlDrive a specific named macOS app via raw input bypassing the Accessibility tree
- assistant-migrationMigrate from ChatGPT, Claude, OpenClaw, Hermes, Manus, and other AI assistants into Vellum by inspecting their data exports, conversation archives, files, prompts, custom instructions, memory, saved memories, tools, GPTs, workflows, integrations, and relationships, then mapping as much as safely possible into Vellum primitives. Handles single-source and multi-source migrations with a unified, deduplicated inventory.
- chatgpt-importImport conversation history from ChatGPT into Vellum
- cli-discoverDiscover which CLI tools are installed, their versions, and authentication status
- computer-useControl the macOS desktop
- contactsManage contacts, communication channels, access control, and invite links