transcribe

$npx mdskill add vellum-ai/vellum-assistant/transcribe

Transcribe audio and video files using the configured speech-to-text provider. Supports multiple STT providers including OpenAI Whisper, Deepgram, and Google Gemini — the active provider is selected in Settings under Speech-to-Text (`services.stt`).

SKILL.md
.github/skills/transcribeView on GitHub ↗
---
name: transcribe
description: Transcribe audio and video files using the configured speech-to-text provider
compatibility: "Designed for Vellum personal assistants"
metadata:
  emoji: "🎙️"
  vellum:
    display-name: "Transcribe"
    activation-hints:
      - "User has an audio or video file on disk they want converted to text"
      - "User wants speech-to-text on a recording, voice memo, podcast, or meeting capture"
      - "User asks for a transcript of a media file (mp3, wav, m4a, mp4, mov, etc.)"
---

Transcribe audio and video files using the configured speech-to-text provider. Supports multiple STT providers including OpenAI Whisper, Deepgram, and Google Gemini — the active provider is selected in Settings under Speech-to-Text (`services.stt`).

## Usage Notes

- The tool accepts a `file_path` (absolute path to a local audio or video file) to transcribe.
- Supported formats: any video (mp4, mov, etc.) or audio (mp3, wav, m4a, etc.) file.
- For video files, audio is automatically extracted via ffmpeg before transcription.
- Large files are automatically split into chunks for processing.
- If no STT provider credentials are configured, the tool will return an error with setup instructions.
- The STT provider (`services.stt`) is shared between transcription and telephony call paths.

## Maintenance

When adding or modifying an STT provider, follow the onboarding checklist at `assistant/docs/stt-provider-onboarding.md`. That document covers the daemon catalog, config schema, adapter wiring, client catalog parity, and required tests.
More from vellum-ai/vellum-assistant