firecrawl

$npx mdskill add vm0-ai/vm0-skills/firecrawl

Scrape websites and extract content via Firecrawl API.

  • Extracts text, HTML, or markdown from any webpage.
  • Depends on Firecrawl API with valid authentication token.
  • Executes scraping based on user intent keywords.
  • Returns structured data directly to the agent context.

SKILL.md

.github/skills/firecrawlView on GitHub ↗
---
name: firecrawl
description: Firecrawl API for web scraping and crawling. Use when user mentions "Firecrawl",
  "crawl website", "scrape site", or web extraction.
---

## Troubleshooting

If requests fail, run `zero doctor check-connector --env-name FIRECRAWL_TOKEN` or `zero doctor check-connector --url https://api.firecrawl.dev/v1/scrape --method POST`

## How to Use

All examples below assume you have `FIRECRAWL_TOKEN` set.

Base URL: `https://api.firecrawl.dev/v1`

## 1. Scrape - Single Page

Extract content from a single webpage.

### Basic Scrape

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "formats": ["markdown"]
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json
```

### Scrape with Options

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://docs.example.com/api",
  "formats": ["markdown"],
  "onlyMainContent": true,
  "timeout": 30000
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data.markdown'
```

### Get HTML Instead

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "formats": ["html"]
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data.html'
```

### Get Screenshot

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "formats": ["screenshot"]
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data.screenshot'
```

**Scrape Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | string | URL to scrape (required) |
| `formats` | array | `markdown`, `html`, `rawHtml`, `screenshot`, `links` |
| `onlyMainContent` | boolean | Skip headers/footers |
| `timeout` | number | Timeout in milliseconds |

## 2. Crawl - Entire Website

Crawl all pages of a website (async operation).

### Start a Crawl

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "limit": 50,
  "maxDepth": 2
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json
```

**Response:**

```json
{
  "success": true,
  "id": "crawl-job-id-here"
}
```

### Check Crawl Status

Replace `<job-id>` with the actual job ID returned from the crawl request:

```bash
curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer $FIRECRAWL_TOKEN" | jq '{status, completed, total}'
```

### Get Crawl Results

Replace `<job-id>` with the actual job ID:

```bash
curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer $FIRECRAWL_TOKEN" | jq '.data[] | {url: .metadata.url, title: .metadata.title}'
```

### Crawl with Path Filters

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://blog.example.com",
  "limit": 20,
  "maxDepth": 3,
  "includePaths": ["/posts/*"],
  "excludePaths": ["/admin/*", "/login"]
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json
```

**Crawl Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | string | Starting URL (required) |
| `limit` | number | Max pages to crawl (default: 100) |
| `maxDepth` | number | Max crawl depth (default: 3) |
| `includePaths` | array | Paths to include (e.g., `/blog/*`) |
| `excludePaths` | array | Paths to exclude |

## 3. Map - URL Discovery

Get all URLs from a website quickly.

### Basic Map

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com"
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.links[:10]'
```

### Map with Search Filter

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://shop.example.com",
  "search": "product",
  "limit": 500
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.links'
```

**Map Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | string | Website URL (required) |
| `search` | string | Filter URLs containing keyword |
| `limit` | number | Max URLs to return (default: 1000) |

## 4. Search - Web Search

Search the web and get full page content.

### Basic Search

Write to `/tmp/firecrawl_request.json`:

```json
{
  "query": "AI news 2024",
  "limit": 5
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data[] | {title: .metadata.title, url: .url}'
```

### Search with Full Content

Write to `/tmp/firecrawl_request.json`:

```json
{
  "query": "machine learning tutorials",
  "limit": 3,
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'
```

**Search Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | string | Search query (required) |
| `limit` | number | Number of results (default: 10) |
| `scrapeOptions` | object | Options for scraping results |

## 5. Extract - AI Data Extraction

Extract structured data from pages using AI.

### Basic Extract

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and description"
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data'
```

### Extract with Schema

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract product information",
  "schema": {
    "type": "object",
    "properties": {
      "name": {"type": "string"},
      "price": {"type": "number"},
      "currency": {"type": "string"},
      "inStock": {"type": "boolean"}
    }
  }
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data'
```

### Extract from Multiple URLs

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": [
    "https://example.com/product/1",
    "https://example.com/product/2"
  ],
  "prompt": "Extract product name and price"
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data'
```

**Extract Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `urls` | array | URLs to extract from (required) |
| `prompt` | string | Description of data to extract (required) |
| `schema` | object | JSON schema for structured output |

## Practical Examples

### Scrape Documentation

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://docs.python.org/3/tutorial/",
  "formats": ["markdown"],
  "onlyMainContent": true
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq -r '.data.markdown' > python-tutorial.md
```

### Find All Blog Posts

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://blog.example.com",
  "search": "post"
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq -r '.links[]'
```

### Research a Topic

Write to `/tmp/firecrawl_request.json`:

```json
{
  "query": "best practices REST API design 2024",
  "limit": 5,
  "scrapeOptions": {"formats": ["markdown"]}
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data[] | {title: .metadata.title, url: .url}'
```

### Extract Pricing Data

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": ["https://example.com/pricing"],
  "prompt": "Extract all pricing tiers with name, price, and features"
}
```

Then run:

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer $FIRECRAWL_TOKEN" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json | jq '.data'
```

### Poll Crawl Until Complete

Replace `<job-id>` with the actual job ID:

```bash
while true; do
  STATUS="$(curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer $FIRECRAWL_TOKEN" | jq -r '.status')"
  echo "Status: $STATUS"
  [ "$STATUS" = "completed" ] && break
  sleep 5
done
```

## Response Format

### Scrape Response

```json
{
  "success": true,
  "data": {
  "markdown": "# Page Title\n\nContent...",
  "metadata": {
  "title": "Page Title",
  "description": "...",
  "url": "https://..."
  }
  }
}
```

### Crawl Status Response

```json
{
  "success": true,
  "status": "completed",
  "completed": 50,
  "total": 50,
  "data": [...]
}
```

## Guidelines

1. **Rate limits**: Add delays between requests to avoid 429 errors
2. **Crawl limits**: Set reasonable `limit` values to control API usage
3. **Main content**: Use `onlyMainContent: true` for cleaner output
4. **Async crawls**: Large crawls are async; poll `/crawl/{id}` for status
5. **Extract prompts**: Be specific for better AI extraction results
6. **Check success**: Always check `success` field in responses

More from vm0-ai/vm0-skills

SkillDescription
account-reconciliationPerform account reconciliations comparing general ledger balances against subledgers, bank statements, or external records. Use for bank reconciliation, GL-to-subledger reconciliation, intercompany reconciliation, balance sheet reconciliation, reconciling item analysis, outstanding item aging, or clearing open items.
agentphoneBuild AI phone agents with AgentPhone API. Use when the user wants to make phone calls, send/receive SMS, manage phone numbers, create voice agents, set up webhooks, or check usage — anything related to telephony, phone numbers, or voice AI.
ahrefsAhrefs SEO API for backlink and keyword analysis. Use when user mentions
amplitudeAmplitude product analytics API. Use when user mentions "Amplitude",
analysis-qaQuality-check a data analysis before sharing — verify joins, aggregations, denominators, time ranges, and metric definitions. Detect pitfalls like survivorship bias, average-of-averages, join explosion, timezone mismatches, incomplete periods, and selection bias. Includes documentation templates for reproducible analyses.
anthropic-managed-agentsAnthropic Managed Agents API for programmatically creating, running, and streaming AI agents on Anthropic's cloud infrastructure. Use when the user mentions "Managed Agents", "Anthropic agent sessions", or needs to create/run/stream an Anthropic agent with tool use (bash, git, web), attach GitHub repositories, or inject secrets via Vault. Do NOT use for standard Claude Messages API — use the Claude API skill instead.
apifyApify web scraping platform. Use when user mentions "scrape website",
asanaAsana API for tasks and projects. Use when user mentions "Asana", "asana.com",
atlassianAtlassian API for Confluence and Jira. Use when user mentions "Confluence
attioAttio REST API for AI-native CRM operations — manage companies, people, deals, and custom objects, plus notes, tasks, lists, and comments. Use when the user mentions "Attio", "CRM record", "create company", "add person", "list entry", "CRM note", or "CRM task".