fetching-blocked-urls

$npx mdskill add oaustegard/claude-skills/fetching-blocked-urls

Bypass blocked content and paywalls using Jina AI's reader.

  • Extract readable text from JavaScript-heavy or paywalled pages.
  • Integrates with Jina AI reader service for content conversion.
  • Activates automatically after web_fetch fails with specific errors.
  • Delivers clean markdown output with preserved links and titles.

SKILL.md

.github/skills/fetching-blocked-urlsView on GitHub ↗
---
name: fetching-blocked-urls
description: Retrieve clean markdown from URLs when web_fetch fails. Converts pages via Jina AI reader service with automatic retry. Use when web_fetch or curl returns 403, blocked, paywall, timeout, JavaScript-rendering errors, or empty content or user explicitly suggests using jina.
metadata:
  version: 0.1.1
---

# Fetching Blocked URLs

Retrieve readable content from URLs that web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown.

## Activation Triggers

Invoke this skill immediately when web_fetch returns:
- 403 Forbidden or access denied
- Paywall or login wall indicators
- Empty, garbled, or truncated content
- JavaScript-heavy SPA failures
- Timeout errors

## Core Command

```bash
curl -s --max-time 30 "https://r.jina.ai/TARGET_URL"
```

The service returns markdown with page title, body text, and preserved links.

## Retry Pattern

Jina's backend has ~10% intermittent failures. Use retry logic to achieve 99%+ success:

```bash
for attempt in 1 2 3; do
  result=$(curl -s --max-time 30 "https://r.jina.ai/TARGET_URL" 2>&1)
  echo "$result" | grep -q "upstream connect error" || { echo "$result"; break; }
  [ $attempt -lt 3 ] && sleep 1
done
```

## Workflow Integration

1. **Primary**: Use web_fetch (native tool)
2. **Fallback**: This skill with retry when web_fetch fails
3. **Escalate**: Request user assistance only after retry exhaustion

Attempt this fallback before asking users to copy-paste content manually.

## Output Format

Jina returns structured markdown:
- `Title:` page title
- `URL Source:` original URL
- `Markdown Content:` extracted body text, links preserved

## Limitations

- Long pages may truncate
- Sites blocking all scrapers remain inaccessible
- Login-required content limited to public portions
- Real-time dynamic content may not render

## Domain Access

`r.jina.ai` is whitelisted in Claude container network configuration.

More from oaustegard/claude-skills

SkillDescription
accessing-github-reposGitHub repository access in containerized environments using REST API and credential detection. Use when git clone fails, or when accessing private repos/writing files via API.
api-credentialsSecurely manages API credentials for multiple providers (Anthropic Claude, Google Gemini, GitHub). Use when skills need to access stored API keys for external service invocations.
asking-questionsGuidance for asking clarifying questions when user requests are ambiguous, have multiple valid approaches, or require critical decisions. Use when implementation choices exist that could significantly affect outcomes.
browsing-blueskyBrowse Bluesky content via API and firehose - search posts, fetch user activity, sample trending topics, read feeds and lists, analyze and categorize accounts. Supports authenticated access for personalized feeds. Use for Bluesky research, user monitoring, trend analysis, feed reading, firehose sampling, account categorization.
building-github-indexGenerate progressive disclosure indexes for GitHub repositories to use as Claude project knowledge. Use when setting up projects referencing external documentation, creating searchable indexes of technical blogs or knowledge bases, combining multiple repos into one index, or when user mentions "index", "github repo", "project knowledge", or "documentation reference".
categorizing-bsky-accountsAnalyze and categorize Bluesky accounts by topic using keyword extraction. Use when users mention Bluesky account analysis, following/follower lists, topic discovery, account curation, or network analysis.
chartingSelect the right Python charting library (seaborn, matplotlib, graphviz) and produce publication-quality static visualizations. Use when creating charts, plots, graphs, diagrams, heatmaps, visualizations from data, or when choosing between matplotlib/seaborn/graphviz. Also triggers for network diagrams, flowcharts, dependency trees, state machines, and entity-relationship diagrams. For interactive browser-rendered charts or uploaded data exploration, defer to charting-vega-lite instead.
charting-vega-liteCreate interactive data visualizations using Vega-Lite declarative JSON grammar. Supports 20+ chart types (bar, line, scatter, histogram, boxplot, grouped/stacked variations, etc.) via templates and programmatic builders. Use when users upload data for charting, request specific chart types, or mention visualizations. Produces portable JSON specs with inline data islands that work in Claude artifacts and can be adapted for production.
check-toolsValidates development tool installations across Python, Node.js, Java, Go, Rust, C/C++, Git, and system utilities. Use when verifying environments or troubleshooting dependencies.
cloning-projectExports project instructions and knowledge files from the current Claude project. Use when users want to clone, copy, backup, or export a project's configuration and files.