transcribe
$
npx mdskill add vellum-ai/vellum-assistant/transcribeTranscribe audio and video files using the configured speech-to-text provider. Supports multiple STT providers including OpenAI Whisper, Deepgram, and Google Gemini — the active provider is selected in Settings under Speech-to-Text (`services.stt`).
SKILL.md
.github/skills/transcribeView on GitHub ↗
---
name: transcribe
description: Transcribe audio and video files using the configured speech-to-text provider
compatibility: "Designed for Vellum personal assistants"
metadata:
emoji: "🎙️"
vellum:
display-name: "Transcribe"
activation-hints:
- "User has an audio or video file on disk they want converted to text"
- "User wants speech-to-text on a recording, voice memo, podcast, or meeting capture"
- "User asks for a transcript of a media file (mp3, wav, m4a, mp4, mov, etc.)"
---
Transcribe audio and video files using the configured speech-to-text provider. Supports multiple STT providers including OpenAI Whisper, Deepgram, and Google Gemini — the active provider is selected in Settings under Speech-to-Text (`services.stt`).
## Usage Notes
- The tool accepts a `file_path` (absolute path to a local audio or video file) to transcribe.
- Supported formats: any video (mp4, mov, etc.) or audio (mp3, wav, m4a, etc.) file.
- For video files, audio is automatically extracted via ffmpeg before transcription.
- Large files are automatically split into chunks for processing.
- If no STT provider credentials are configured, the tool will return an error with setup instructions.
- The STT provider (`services.stt`) is shared between transcription and telephony call paths.
## Maintenance
When adding or modifying an STT provider, follow the onboarding checklist at `assistant/docs/stt-provider-onboarding.md`. That document covers the daemon catalog, config schema, adapter wiring, client catalog parity, and required tests.
More from vellum-ai/vellum-assistant
- acpSpawn external coding agents via the Agent Client Protocol (ACP)
- amazonShop on Amazon and Amazon Fresh through your browser
- api-mappingRecord and analyze API surfaces of web services
- app-builderBuild and edit small, personal visual tools and artifacts — dashboards, trackers, calculators, data visualizations, charts, simple landing pages, and slide decks the user wants for THEMSELVES. This is the right skill whenever the user asks to "visualize this," "make a chart," or "build an artifact" for their own use, or to edit an app they already built here. Do NOT reach for a ui_show dynamic_page to fake an artifact — build a real persistent app here. NOT for complex, multi-user, or shippable products — those go to a real project folder with a coding agent (see Scope below).
- app-controlDrive a specific named macOS app via raw input bypassing the Accessibility tree
- assistant-migrationMigrate from ChatGPT, Claude, OpenClaw, Hermes, Manus, and other AI assistants into Vellum by inspecting their data exports, conversation archives, files, prompts, custom instructions, memory, saved memories, tools, GPTs, workflows, integrations, and relationships, then mapping as much as safely possible into Vellum primitives. Handles single-source and multi-source migrations with a unified, deduplicated inventory.
- chatgpt-importImport conversation history from ChatGPT into Vellum
- cli-discoverDiscover which CLI tools are installed, their versions, and authentication status
- computer-useControl the macOS desktop
- contactsManage contacts, communication channels, access control, and invite links