image-scraper

Name: image-scraper
Author: aAAaqwq/AGI-Super-Team

$npx mdskill add aAAaqwq/AGI-Super-Team/image-scraper

Extract and download all images from any provided URL.

Users need bulk image collection from web pages.
Relies on Python3, curl, and browser automation.
Parses HTML to identify and filter image sources.
Saves files locally in a dedicated output directory.

SKILL.md

.github/skills/image-scraperView on GitHub ↗

---
name: image-scraper
description: Scrape and download all images from a given URL. Takes a URL, extracts image URLs from the page, and downloads them. Uses python3/curl as primary method, falls back to browser automation if needed. Use when user provides a URL and wants to download images from that page.
---

# Image Scraper

Scrape all images from a given URL and download them locally.

## Method 1: Python3 (Primary - Zero Dependency)

```python
#!/usr/bin/env python3
"""Download all images from a URL."""
import sys
import os
import re
import urllib.request
import urllib.error
from html.parser import HTMLParser

class ImageParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.images = []
    def handle_starttag(self, tag, attrs):
        if tag == 'img':
            for attr, val in attrs:
                if attr == 'src' and val:
                    self.images.append(val)
        if tag == 'source':
            for attr, val in attrs:
                if attr == 'src' and val:
                    self.images.append(val)

def scrape_images(url, output_dir="images"):
    os.makedirs(output_dir, exist_ok=True)
    req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    with urllib.request.urlopen(req, timeout=15) as resp:
        html = resp.read().decode('utf-8', errors='ignore')
    parser = ImageParser()
    parser.feed(html)
    # Deduplicate and filter
    seen = set()
    urls = []
    for img in parser.images:
        if img.startswith('//'):
            img = 'https:' + img
        if img.startswith('http') and img not in seen:
            seen.add(img)
            urls.append(img)
    print(f"Found {len(urls)} images")
    for i, img_url in enumerate(urls):
        try:
            ext = os.path.splitext(img_url.split('?')[0])[1] or '.jpg'
            fname = f"{output_dir}/img_{i:03d}{ext}"
            urllib.request.urlretrieve(img_url, fname)
            print(f"  [{i+1}] {fname}")
        except Exception as e:
            print(f"  [{i+1}] FAILED: {e}")
    return urls

if __name__ == "__main__":
    url = sys.argv[1] if len(sys.argv) > 1 else input("URL: ")
    scrape_images(url)
```

**Usage:**
```bash
python3 /path/to/image-scraper.py "https://example.com/article"
```

## Method 2: Curl + Grep (Minimal)

```bash
# Extract image URLs and download
curl -sL "URL" | grep -oP 'https?://[^"]+\.(jpg|jpeg|png|webp|gif)' | sort -u | head -20 | while read url; do
  curl -sL "$url" -o "images/$(echo $url | md5sum | cut -d' ' -f1).${url##*.}"
done
```

## Method 3: Browser Automation (Fallback)

Use OpenClaw's browser tool when the page is JavaScript-rendered or Method 1 fails.

```bash
# 1. Open page in browser
browser(action=open, url="URL")

# 2. Get page content and extract images via JavaScript
browser(action=act, targetId="TAB_ID", request={
  "kind": "evaluate",
  "fn": "() => Array.from(document.querySelectorAll('img')).map(img => img.src)"
})

# 3. Download each image with curl
```

## Decision Flow

1. **Try Method 1** (python3) first — handles most static pages
2. **If 403/blocked**: Try adding headers (`Referer`, `Accept`)
3. **If JS-rendered or paywalled**: Use Method 3 (browser)
4. **Always** print the downloaded file paths

## Output

- Images saved to `./images/` by default
- Named `img_000.jpg`, `img_001.png`, etc.
- Report: "Downloaded N images to images/"

## Notes

- Only downloads images from the given URL, not full site
- Filters out tracking pixels and tiny icons (width/height < 50px optionally)
- Respects robots.txt implicitly (no enforcement)
- For Twitter/X: browser method may be needed due to JS rendering

More from aAAaqwq/AGI-Super-Team

Skill	Description
a-fund-monitor	监控 A 股基金实时估值与盘后净值，自动判断交易日并生成提醒或分析。
account-executive	>
add-lead	Add company/person/relationship to CRM
ads	Comprehensive ad account analysis across all major platforms (Google, Meta
ads-agent	AI-агент для управления Facebook рекламой. Вызывай для анализа, оптимизации, создания кампаний и отчётов.
afrexai-compliance-audit	Run internal compliance audits against major governance and security
afrexai-personal-finance	Complete personal finance system — budgeting, debt payoff, investing, tax optimization, net worth tracking, and financial independence planning. Use when managing money, building wealth, paying off debt, planning retirement, or optimizing taxes. Zero dependencies.
after-sales	Use when managing post-purchase experience, building customer loyalty, or increasing repeat purchases
agent-contacts	AI agent contacts — add, list, remove MCP contacts. Use when someone gives an agent URL, or when you need to view/remove contacts.
agent-model-switcher	批量查看和切换子 agent 的模型配置，用于统一调整多 agent 的 provider/model 设置。