cudf-analytics
$
npx mdskill add langchain-ai/deepagents/cudf-analyticsAccelerate large-scale data analysis by performing computations directly on NVIDIA GPUs.
- Handles statistical summaries, grouping, and outlier detection on tabular data.
- Integrates with NVIDIA cuDF for GPU-accelerated data manipulation.
- Triggers when tasks require profiling or complex aggregations on datasets.
- Returns results via a pandas-compatible structure, optimized for downstream use.
SKILL.md
.github/skills/cudf-analyticsView on GitHub ↗
---
name: cudf-analytics
description: Use for GPU-accelerated data analysis on datasets, CSVs, or tabular data using NVIDIA cuDF. Triggers when tasks involve groupby aggregations, statistical summaries, anomaly detection, or large-scale data profiling.
---
# cuDF Analytics Skill
GPU-accelerated data analysis using NVIDIA RAPIDS cuDF. cuDF provides a pandas-like API that runs on NVIDIA GPUs, enabling massive speedups on large datasets.
## When to Use This Skill
Use this skill when:
- Analyzing CSV files, datasets, or tabular data
- Computing statistical summaries (mean, median, std, quartiles)
- Performing groupby aggregations
- Detecting anomalies or outliers in data
- Profiling datasets with millions of rows
- Computing correlation matrices
## Initialization (REQUIRED)
Always start every script with this boilerplate. It tests actual GPU operations, not just import.
```python
import pandas as pd
try:
import cudf
# Smoke-test: verify GPU compute AND host transfer both work
_test = cudf.Series([1, 2, 3])
assert _test.sum() == 6
assert _test.to_pandas().tolist() == [1, 2, 3]
GPU = True
except Exception as e:
print(f"[GPU] cudf unavailable, falling back to pandas: {e}")
GPU = False
def read_csv(path):
return cudf.read_csv(path) if GPU else pd.read_csv(path)
def to_pd(df):
"""Convert cuDF DataFrame/Series to pandas. Use this instead of .to_pandas() directly."""
if not GPU:
return df
try:
return df.to_pandas()
except Exception as e:
print(f"[GPU] .to_pandas() failed, using Arrow fallback: {e}")
return df.to_arrow().to_pandas()
```
## Quick Reference
cuDF mirrors the pandas API. Common operations:
### Read Data
```python
df = read_csv("data.csv")
```
### Statistical Summary
```python
# Use to_pd() when you need pandas output
summary = to_pd(df[["value", "score"]].describe())
# Scalar values work directly with float()
mean_val = float(df["value"].mean())
q1 = float(df["value"].quantile(0.25))
# Correlation
corr = float(df["value"].corr(df["score"]))
```
### Groupby Aggregation
```python
result = df.groupby("category").agg({
"revenue": ["sum", "mean", "count"],
"quantity": ["sum", "mean"],
})
result_pd = to_pd(result)
```
### Anomaly Detection (IQR Method)
```python
col = "value"
Q1 = float(df[col].quantile(0.25))
Q3 = float(df[col].quantile(0.75))
IQR = Q3 - Q1
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
outliers = to_pd(df[(df[col] < lower) | (df[col] > upper)])
```
### Anomaly Detection (Z-Score Method)
```python
mean = float(df[col].mean())
std = float(df[col].std())
df["z_score"] = (df[col] - mean) / std
anomalies = to_pd(df[df["z_score"].abs() > 3])
```
### Filtering and Selection
```python
# Filter rows
filtered = df[df["status"] == "active"]
# Select columns
subset = df[["name", "revenue", "date"]]
# Sort
sorted_df = df.sort_values("revenue", ascending=False)
# Convert to pandas for final output / iteration
result_pd = to_pd(sorted_df)
```
## Data Type Requirements
cuDF requires explicit type specification for optimal performance:
- Use `float32` or `float64` for numeric data
- Use `int32` or `int64` for integer data
- String columns use cuDF's string dtype automatically
## Output Guidelines
When reporting analysis results:
- Include dataset dimensions (rows x columns)
- Show key statistics in formatted tables
- Highlight notable patterns, trends, or anomalies
- Provide both summary statistics and specific examples
- Note any data quality issues (missing values, outliers)
More from langchain-ai/deepagents
- analyze-market>-
- arxiv-searchSearches arXiv for preprints and academic papers, retrieves abstracts, and filters by topic. Use when the user asks to find research papers, search arXiv, look up preprints, find academic articles in physics, math, CS, biology, statistics, or related fields.
- blog-postWrites and structures long-form blog posts, creates tutorial outlines, and optimizes content for SEO with cover image generation. Use when the user asks to write a blog post, article, how-to guide, tutorial, technical writeup, thought leadership piece, or long-form content.
- code-reviewPerform a structured code review of changes, checking for correctness, style, tests, and potential issues.
- coding-prefsRead the user's coding preferences from /memory/coding-prefs.md before making non-trivial style decisions, and append new preferences when the user gives durable feedback.
- competitor-analysis>-
- cuml-machine-learningUse for GPU-accelerated machine learning on tabular data using NVIDIA cuML. Triggers when tasks involve classification, regression, clustering, dimensionality reduction, or model training on datasets.
- data-visualizationUse for creating publication-quality charts and multi-panel analysis summaries. Triggers when tasks involve visualizing data, plotting results, creating charts, or producing visual reports from analysis output.
- gpu-document-processingUse when processing large PDFs, document collections, or bulk text extraction tasks that benefit from GPU-accelerated processing. Triggers when the user provides large documents or needs bulk document analysis.
- langgraph-docsFetches and references LangGraph Python documentation to build stateful agents, create multi-agent workflows, and implement human-in-the-loop patterns. Use when the user asks about LangGraph, graph agents, state machines, agent orchestration, LangGraph API, or needs LangGraph implementation guidance.