youtube-apify-transcript
$
npx mdskill add gooseworks-ai/goose-skills/youtube-apify-transcriptRetrieve YouTube transcripts from cloud IPs without bot detection.
- Bypasses YouTube restrictions to fetch video text content.
- Depends on APIFY residential proxies for request routing.
- Executes based on provided YouTube video URL input.
- Delivers raw transcript text directly to the agent.
SKILL.md
.github/skills/youtube-apify-transcriptView on GitHub ↗
---
name: youtube-apify-transcript
description: Fetch YouTube transcripts via APIFY API. Works from cloud IPs (Hetzner, AWS, etc.) by bypassing YouTube's bot detection. Free tier includes $5/month credits (~714 videos). No credit card required.
tags: [research]
---
# youtube-apify-transcript
Fetch YouTube transcripts via APIFY API (works from cloud IPs, bypasses YouTube bot detection).
## Why APIFY?
YouTube blocks transcript requests from cloud IPs (AWS, GCP, etc.). APIFY runs the request through residential proxies, bypassing bot detection reliably.
## Free Tier
- **$5/month free credits** (~714 videos)
- No credit card required
- Perfect for personal use
## Cost
- **$0.007 per video** (less than 1 cent!)
- Track usage at: https://console.apify.com/billing
## Links
- [APIFY Pricing](https://apify.com/pricing)
- [Get API Key](https://console.apify.com/account/integrations)
- [YouTube Transcript Scraper Actor](https://apify.com/pintostudio/youtube-transcript-scraper)
## Setup
1. Create free APIFY account: https://apify.com/
2. Get your API token: https://console.apify.com/account/integrations
3. Set environment variable:
```bash
# Add to ~/.bashrc or ~/.zshrc
export APIFY_API_TOKEN="apify_api_YOUR_TOKEN_HERE"
# Or use .env file (never commit this!)
echo 'APIFY_API_TOKEN=apify_api_YOUR_TOKEN_HERE' >> .env
```
## Usage
### Basic Usage
```bash
# Get transcript as text (uses cache by default)
python3 scripts/fetch_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"
# Short URL also works
python3 scripts/fetch_transcript.py "https://youtu.be/VIDEO_ID"
```
### Options
```bash
# Output to file
python3 scripts/fetch_transcript.py "URL" --output transcript.txt
# JSON format (includes timestamps)
python3 scripts/fetch_transcript.py "URL" --json
# Both: JSON to file
python3 scripts/fetch_transcript.py "URL" --json --output transcript.json
# Specify language preference
python3 scripts/fetch_transcript.py "URL" --lang de
```
### Caching (saves money!)
Transcripts are cached locally by default. Repeat requests for the same video cost $0.
```bash
# First request: fetches from APIFY ($0.007)
python3 scripts/fetch_transcript.py "URL"
# Second request: uses cache (FREE!)
python3 scripts/fetch_transcript.py "URL"
# Output: [cached] Transcript for: VIDEO_ID
# Bypass cache (force fresh fetch)
python3 scripts/fetch_transcript.py "URL" --no-cache
# View cache stats
python3 scripts/fetch_transcript.py --cache-stats
# Clear all cached transcripts
python3 scripts/fetch_transcript.py --clear-cache
```
Cache location: `.cache/` in skill directory (override with `YT_TRANSCRIPT_CACHE_DIR` env var)
### Batch Mode
Process multiple videos at once:
```bash
# Create a file with URLs (one per line)
cat > urls.txt << EOF
https://youtube.com/watch?v=VIDEO1
https://youtu.be/VIDEO2
https://youtube.com/watch?v=VIDEO3
EOF
# Process all URLs
python3 scripts/fetch_transcript.py --batch urls.txt
# Batch with JSON output to file
python3 scripts/fetch_transcript.py --batch urls.txt --json --output all_transcripts.json
```
## APIFY Actor Input
The script sends the following input to `pintostudio/youtube-transcript-scraper`:
```json
{
"videoUrl": "https://www.youtube.com/watch?v=VIDEO_ID"
}
```
**Output fields:**
Each result contains a `data` array of transcript segments:
| Field | Type | Description |
|---------|--------|------------------------------------|
| `start` | number | Segment start time (seconds) |
| `dur` | number | Segment duration (seconds) |
| `text` | string | Transcript text for this segment |
### Output Formats
**Text (default):**
```
Hello and welcome to this video.
Today we're going to talk about...
```
**JSON (--json):**
```json
{
"video_id": "dQw4w9WgXcQ",
"title": "Video Title",
"transcript": [
{"start": 0.0, "dur": 2.5, "text": "Hello and welcome"},
{"start": 2.5, "dur": 3.0, "text": "to this video"}
],
"full_text": "Hello and welcome to this video..."
}
```
## Error Handling
The script handles common errors:
- Invalid YouTube URL
- Video has no transcript
- API quota exceeded
- Network errors
More from gooseworks-ai/goose-skills