whisper-transcription
$
npx mdskill add guia-matthieu/clawfu-skills/whisper-transcriptionTranscribes audio and video files to text for content repurposing and accessibility tasks.
- Converts podcasts, interviews, and videos into text for blogs, subtitles, or searchable archives.
- Integrates with OpenAI Whisper model and requires ffmpeg for audio processing.
- Uses structured workflows and best practices to suggest technical transcription approaches.
- Delivers results as text transcripts or formatted files like SRT for subtitles.
SKILL.md
.github/skills/whisper-transcriptionView on GitHub ↗
--- name: whisper-transcription description: "Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives" license: MIT metadata: author: ClawFu version: 1.0.0 mcp-server: "@clawfu/mcp-skills" --- # Whisper Transcription > Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features. ## When to Use This Skill - **Podcast repurposing** - Convert episodes to blog posts, show notes, social snippets - **Video subtitles** - Generate SRT/VTT files for YouTube, social media - **Interview extraction** - Pull quotes and insights from recorded calls - **Content audit** - Make audio/video libraries searchable - **Translation** - Transcribe and translate foreign language content ## What Claude Does vs What You Decide | Claude Does | You Decide | |-------------|------------| | Structures production workflow | Final creative direction | | Suggests technical approaches | Equipment and tool choices | | Creates templates and checklists | Quality standards | | Identifies best practices | Brand/voice decisions | | Generates script outlines | Final script approval | ## Dependencies ```bash pip install openai-whisper torch ffmpeg-python click # Also requires ffmpeg installed on system # macOS: brew install ffmpeg # Ubuntu: sudo apt install ffmpeg ``` ## Commands ### Transcribe Single File ```bash python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt ``` ### Batch Transcription ```bash python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/ ``` ### Transcribe + Translate ```bash python scripts/main.py translate foreign-audio.mp3 --to en ``` ### Extract Timestamps ```bash python scripts/main.py timestamps podcast.mp3 --format json ``` ## Examples ### Example 1: Podcast to Blog Post ```bash # Transcribe 1-hour podcast python scripts/main.py transcribe episode-42.mp3 --model medium # Output: episode-42.txt (full transcript with timestamps) # Processing time: ~5 min for 1 hour audio on M1 Mac ``` ### Example 2: YouTube Subtitles ```bash # Generate SRT for video upload python scripts/main.py transcribe marketing-video.mp4 --format srt # Output: marketing-video.srt # Upload directly to YouTube/Vimeo ``` ### Example 3: Batch Process Interview Library ```bash # Transcribe all recordings in folder python scripts/main.py batch ./customer-interviews/ --model small --format txt # Output: ./customer-interviews/*.txt (one per audio file) ``` ## Model Selection Guide | Model | Speed | Accuracy | VRAM | Best For | |-------|-------|----------|------|----------| | `tiny` | Fastest | ~70% | 1GB | Quick drafts, short clips | | `base` | Fast | ~80% | 1GB | Social media clips | | `small` | Medium | ~85% | 2GB | Podcasts, interviews | | `medium` | Slow | ~90% | 5GB | Professional transcripts | | `large` | Slowest | ~95% | 10GB | Critical accuracy needs | **Recommendation:** Start with `small` for most marketing content. Use `medium` for client deliverables. ## Output Formats | Format | Extension | Use Case | |--------|-----------|----------| | `txt` | .txt | Blog posts, analysis | | `srt` | .srt | Video subtitles (YouTube) | | `vtt` | .vtt | Web video subtitles | | `json` | .json | Programmatic access | | `tsv` | .tsv | Spreadsheet analysis | ## Performance Tips 1. **GPU acceleration** - 10x faster with CUDA GPU 2. **Audio extraction** - Script auto-extracts audio from video 3. **Chunking** - Long files auto-split for memory efficiency 4. **Language detection** - Automatic, or specify with `--language` ## Skill Boundaries ### What This Skill Does Well - Structuring audio production workflows - Providing technical guidance - Creating quality checklists - Suggesting creative approaches ### What This Skill Cannot Do - Replace audio engineering expertise - Make subjective creative decisions - Access or edit audio files directly - Guarantee commercial success ## Related Skills - [video-processing](../video-processing/) - Extract audio from video - [youtube-downloader](../youtube-downloader/) - Download videos to transcribe - [content-repurposer](../content-repurposer/) - Transform transcripts to content - [podcast-production](../../audio/podcast-production/) - Create podcasts ## Skill Metadata - **Mode**: cyborg ```yaml category: automation subcategory: audio-processing dependencies: [openai-whisper, torch, ffmpeg-python] difficulty: beginner time_saved: 10+ hours/week ```
More from guia-matthieu/clawfu-skills
- aarrr-metricsMeasure and optimize growth using the AARRR (Pirate Metrics) framework with stage-specific KPIs and funnel analysis
- ab-test-stats"Calculate A/B test statistical significance. Use when: determining if test results are significant; calculating required sample size; estimating test duration; analyzing conversion experiments; making data-driven decisions"
- account-healthAssess customer account health using product usage, support sentiment, payment status, and relationship signals
- ad-spend-optimizer"Analyze paid advertising performance across channels and recommend budget reallocation to maximize ROAS and minimize CAC. Use when: planning quarterly ad budget allocation, diagnosing underperforming ad channels, deciding whether to scale spend on a channel, calculating marginal ROI across Google Ads, Meta, LinkedIn, or TikTok, rebalancing media mix after performance shifts, or setting up a test-and-scale framework for new channels."
- ai-bot-log-auditUse when analyzing server logs to understand how AI crawlers (GPTBot, ClaudeBot, PerplexityBot) interact with your site. Use when optimizing content placement for LLM retrieval, diagnosing why AI search isn't citing your content, or auditing crawl patterns to find optimization gaps.
- ai-storyboard-2x2"Créez des storyboards visuellement cohérents en utilisant la technique des 2x2 Grid Shots de PJ Ace, garantissant éclairage, personnages et décors uniformes entre les plans. Use when: **Après avoir finalisé un script vidéo** - Transformer le concept en visuels; **Besoin de cohérence visuelle** - Personnages et éclairage constants entre les plans; **Préparer des assets pour animation** - Frames prêtes pour Veo, Runway, Kling; **Présenter un storyboard client** - Visualisation avant production;..."
- ai-video-concept"Développez une idée créative et structurez un script vidéo optimisé pour la génération IA, en suivant la méthode des scènes de 8 secondes de PJ Ace. Use when: **Démarrer une publicité vidéo IA** - Transformer une idée brute en script structuré; **Créer du contenu vidéo pour les réseaux sociaux** - TikTok, Reels, YouTube Shorts; **Développer un concept de campagne** - Avant de passer au storyboard; **Pitcher une idée vidéo** - Présenter un concept à un client ou une équipe; **Adapter un messag..."
- ai-video-prompting"Générez des prompts optimisés pour chaque modèle de génération vidéo IA (Veo 3, Runway Gen-3, Kling 2.6, Pika), en exploitant leurs forces spécifiques. Use when: **Animer des frames de storyboard** - Transformer des images fixes en vidéo; **Choisir le bon modèle** - Sélectionner Veo, Runway, Kling ou Pika selon le besoin; **Optimiser la qualité de génération** - Prompts structurés pour meilleurs résultats; **Créer des transitions fluides** - Scene extension, first/last frame; **Utiliser le mo..."
- ai-video-qa"Validez la qualité de vos vidéos IA avant publication avec une checklist complète couvrant technique, créatif, et positionnement marque. Use when: **Avant publication** - Dernière validation avant mise en ligne; **Revue client** - Préparer les points de feedback anticipés; **Itération qualité** - Identifier les problèmes à corriger; **Go/No-Go decision** - Décider si la vidéo est prête; **Post-mortem** - Analyser pourquoi une vidéo a (ou n'a pas) performé"
- ai-voice-design"Concevez et générez des voix IA pour vos vidéos en utilisant ElevenLabs ou Qwen3-TTS, avec clonage vocal, design par description, et synchronisation lip-sync. Use when: **Créer une voix de marque** - Définir le ton vocal pour une campagne; **Cloner une voix existante** - Reproduire une voix avec autorisation; **Designer une voix originale** - Créer une voix à partir d'une description; **Multi-personnages** - Gérer plusieurs voix dans une même vidéo; **Lip-sync vidéo IA** - Synchroniser voix e..."