supadata

$npx mdskill add vm0-ai/vm0-skills/supadata

Fetch YouTube transcripts and web data via Supadata API.

  • Extract video content and channel statistics from YouTube.
  • Integrates with Supadata API using x-api-key authentication.
  • Accepts video URLs and text preferences to customize output.
  • Returns plain text or timestamped JSON chunks for analysis.
SKILL.md
.github/skills/supadataView on GitHub ↗
---
name: supadata
description: Supadata API for YouTube/web data. Use when user mentions "Supadata",
  "YouTube data", "channel stats", or web scraping data.
---

## Troubleshooting

If requests fail, run `zero doctor check-connector --env-name SUPADATA_TOKEN` or `zero doctor check-connector --url https://api.supadata.ai/v1/transcript --method POST`

## How to Use

All examples below assume you have `SUPADATA_TOKEN` set.

The base URL for the API is:

- `https://api.supadata.ai/v1`

Authentication uses the `x-api-key` header.

### 1. Get YouTube Video Transcript

Extract transcript from a YouTube video:

Write to `/tmp/supadata_url.txt`:

```
https://www.youtube.com/watch?v=dQw4w9WgXcQ
```

```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"
```

**Parameters:**

- `url`: Video URL (required)
- `text`: Return plain text (`true`) or timestamped chunks (`false`, default)
- `lang`: Preferred language (ISO 639-1 code, e.g., `en`, `zh`)
- `mode`: `native` (existing only), `generate` (AI), `auto` (default)

### 2. Get Transcript with Timestamps

Get transcript with timing information:

```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=false" | jq '.content[:3]'
```

Response format:
```json
{
  "content": [
  {"text": "Hello", "offset": 0, "duration": 1500, "lang": "en"}
  ],
  "lang": "en",
  "availableLangs": ["en", "es", "zh"]
}
```

### 3. Get TikTok/Instagram/X Transcript

Extract transcript from other platforms:

```bash
# TikTok
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"

# Instagram Reel
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"
```

Supported platforms: YouTube, TikTok, Instagram, X (Twitter), Facebook

### 4. Native Transcript Only (Save Credits)

Fetch only existing transcripts without AI generation:

```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true" -d "mode=native"
```

Use `mode=native` to avoid AI generation costs (1 credit vs 2 credits/min).

### 5. Get YouTube Channel Metadata

Get channel information:

```bash
curl -s "https://api.supadata.ai/v1/youtube/channel" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "id=@mkbhd" | jq '{name, subscriberCount, videoCount}
```

Accepts channel URL, channel ID, or handle (e.g., `@mkbhd`).

### 6. Get YouTube Video Metadata

Get video information:

```bash
curl -s "https://api.supadata.ai/v1/youtube/video" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" | jq '{title, viewCount, likeCount, duration}
```

### 7. Get Social Media Metadata

Get metadata from any supported platform:

```bash
curl -s "https://api.supadata.ai/v1/metadata" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt"
```

Works with YouTube, TikTok, Instagram, X, Facebook posts.

### 8. Scrape Web Page to Markdown

Extract web page content:

```bash
curl -s "https://api.supadata.ai/v1/web/scrape" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt"
```

Returns page content in Markdown format, ideal for AI processing.

### 9. Map Website Links

Get all links from a website:

```bash
curl -s "https://api.supadata.ai/v1/web/map" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" | jq '.urls[:10]'
```

### 10. Crawl Website (Async)

Start a crawl job for multiple pages.

Write to `/tmp/supadata_request.json`:

```json
{
  "url": "https://example.com",
  "maxPages": 10
}
```

Then run:

```bash
# Start crawl
JOB_ID="$(curl -s "https://api.supadata.ai/v1/web/crawl" -X POST -H "x-api-key: $SUPADATA_TOKEN" -H "Content-Type: application/json" -d @/tmp/supadata_request.json | jq -r '.jobId')"

echo "Job ID: ${JOB_ID}"

# Check status
curl -s "https://api.supadata.ai/v1/web/crawl/<your-job-id>" -H "x-api-key: $SUPADATA_TOKEN" | jq '{status, pagesCompleted}'
```

Status values: `queued`, `active`, `completed`, `failed`

### 11. Translate Transcript

Translate a YouTube transcript to another language:

```bash
curl -s "https://api.supadata.ai/v1/youtube/transcript/translate" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "lang=zh" -d "text=true"
```

## Response Handling

**Synchronous (HTTP 200):** Direct result returned.

**Asynchronous (HTTP 202):** Returns `jobId` for polling:
```json
{"jobId": "abc123"}
```

Poll the job endpoint until status is `completed`.

## Guidelines

1. **Use `mode=native` to save credits**: Only fetches existing transcripts
2. **URL encode parameters**: Use `--data-urlencode` for URLs
3. **Check available languages**: Response includes `availableLangs` array
4. **Handle async responses**: Some requests return job IDs for polling
5. **Max file size**: 1GB for direct file URLs
6. **Supported formats**: MP4, WEBM, MP3, FLAC, MPEG, M4A, OGG, WAV
More from vm0-ai/vm0-skills