gpu-document-processing

Name: gpu-document-processing
Author: langchain-ai/deepagents

$npx mdskill add langchain-ai/deepagents/gpu-document-processing

Process large document sets and collections using dedicated GPU-accelerated backend tools.

Analyze multi-page PDFs or groups of files requiring intensive text extraction.
Leverages a sandboxed environment for computationally heavy parsing and embedding tasks.
Triggers automatically when the input volume suggests bulk data handling or deep analysis.
Returns structured, processed data directly to the agent for subsequent reasoning and output.

SKILL.md

.github/skills/gpu-document-processingView on GitHub ↗

---
name: gpu-document-processing
description: Use when processing large PDFs, document collections, or bulk text extraction tasks that benefit from GPU-accelerated processing. Triggers when the user provides large documents or needs bulk document analysis.
---

# GPU Document Processing Skill

Process large documents and document collections using GPU-accelerated tools. This skill uses the sandbox-as-tool pattern: the agent runs on CPU for reasoning, and sends document processing work to a GPU-equipped environment.

## When to Use This Skill

Use this skill when:
- Processing large PDF files (50+ pages)
- Analyzing collections of documents (10+ files)
- Extracting structured data from unstructured documents
- Performing bulk text extraction and chunking
- Generating embeddings for large document sets
- The user uploads or references large documents for analysis

## Architecture: Sandbox as Tool

This skill follows the **sandbox-as-tool pattern** for GPU execution:

1. **Agent reasons on CPU** - planning, synthesis, report writing
2. **Processing sent to GPU sandbox** - document parsing, embedding, extraction
3. **Results returned to agent** - structured output for further analysis

This separation ensures:
- API keys stay outside the sandbox (security)
- Agent state persists independently of processing jobs
- Processing can be parallelized across documents
- Cost-efficient: GPU used only during processing, not during reasoning

## Capabilities

### PDF Text Extraction
Extract text content from PDF documents with layout preservation:
- Headers, paragraphs, lists, and tables detected separately
- Page numbers and section boundaries preserved
- Multi-column layout handling

### Tabular Data Extraction
Extract tables from documents into structured formats:
- PDF tables to CSV/DataFrames using GPU-accelerated parsing
- Automatic column type detection
- Handles merged cells and multi-row headers

### Document Chunking
Split large documents into meaningful chunks for analysis:
- Semantic chunking (by topic/section boundaries)
- Fixed-size chunking with overlap for embedding
- Configurable chunk sizes (default: 512 tokens)

### Embedding Generation
Generate vector embeddings for document chunks:
- Uses NVIDIA NeMo Retriever NIM for GPU-accelerated embedding
- Supports batch processing for large document sets
- Compatible with standard vector stores (Milvus, ChromaDB)

## Workflow

1. **Receive document reference** from the orchestrator
2. **Determine processing type** (extraction, analysis, embedding)
3. **Send to GPU sandbox** for processing
4. **Collect structured results** (text, tables, embeddings)
5. **Write findings** to /shared/ for the orchestrator to synthesize

## Processing Large Document Collections

For multiple documents:
1. Process documents in parallel batches (3-5 concurrent)
2. Extract key metadata first (title, date, author, page count)
3. Generate per-document summaries
4. Cross-reference findings across documents
5. Write consolidated findings with per-document citations

## Output Format

When reporting document processing results:
- Include document metadata (filename, pages, size)
- Structure extracted content by section/chapter
- Format tables as markdown tables
- Include page references for all extracted content
- Note any extraction quality issues (scanned images, corrupted pages)

## Integration with NVIDIA NIM

For production deployments, GPU document processing can leverage:
- **NVIDIA NeMo Retriever**: GPU-accelerated embedding and retrieval
- **NVIDIA RAPIDS cuDF**: Tabular data processing from extracted tables
- **NVIDIA Triton**: Scalable inference for document classification models

See NVIDIA's NIM documentation for self-hosted deployment options.