arxiv-automation

Name: arxiv-automation
Author: aAAaqwq/AGI-Super-Team

$npx mdskill add aAAaqwq/AGI-Super-Team/arxiv-automation

Automate arXiv research by searching, downloading, and summarizing papers.

Enables agents to retrieve and analyze academic papers instantly.
Depends on the arXiv API for data retrieval and PDF access.
Prioritizes results by submission date and relevance to queries.
Delivers structured summaries and direct links to full PDFs.

SKILL.md

.github/skills/arxiv-automationView on GitHub ↗

---
name: arxiv-automation
description: "Search and monitor arXiv papers. Query by topic, author, or category. Track new papers, download PDFs, and summarize abstracts for research workflows."
---

# arXiv Automation

Search, monitor, and analyze academic papers from arXiv.

## Capabilities

- Search papers by keyword, author, category
- Monitor new submissions in specific categories
- Download PDFs for analysis
- Extract and summarize abstracts
- Track citation-worthy papers

## Usage

### Search Papers (arXiv API)

```python
import urllib.request, urllib.parse, xml.etree.ElementTree as ET

def search_arxiv(query, max_results=10):
    base_url = "http://export.arxiv.org/api/query?"
    params = urllib.parse.urlencode({
        "search_query": query,
        "start": 0,
        "max_results": max_results,
        "sortBy": "submittedDate",
        "sortOrder": "descending"
    })
    url = base_url + params
    response = urllib.request.urlopen(url).read()
    root = ET.fromstring(response)
    ns = {"atom": "http://www.w3.org/2005/Atom"}
    papers = []
    for entry in root.findall("atom:entry", ns):
        papers.append({
            "title": entry.find("atom:title", ns).text.strip(),
            "summary": entry.find("atom:summary", ns).text.strip()[:200],
            "link": entry.find("atom:id", ns).text,
            "published": entry.find("atom:published", ns).text,
            "authors": [a.find("atom:name", ns).text for a in entry.findall("atom:author", ns)]
        })
    return papers

# Example: search for LLM agent papers
papers = search_arxiv("all:LLM AND all:agent", max_results=5)
for p in papers:
    print(f"{p['title']}\n  {p['link']}\n  {', '.join(p['authors'][:3])}\n")
```

### Monitor Categories

Common CS categories:
| Category | Description |
|----------|-------------|
| cs.AI | Artificial Intelligence |
| cs.CL | Computation and Language (NLP) |
| cs.LG | Machine Learning |
| cs.CV | Computer Vision |
| cs.SE | Software Engineering |

RSS feeds: `http://arxiv.org/rss/{category}` (e.g., `http://arxiv.org/rss/cs.AI`)

### Download PDF

```python
# arXiv ID format: 2401.12345
arxiv_id = "2401.12345"
pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
```

## Rate Limits

- arXiv API: max 1 request per 3 seconds
- Be respectful of arXiv's resources
- Use RSS feeds for monitoring (less load than API queries)

## Integration

Combine with `pdf` skill for PDF text extraction and analysis.
Combine with `rss-automation` for periodic monitoring of new papers.