datadog-apm
$
npx mdskill add automateyournetwork/netclaw/datadog-apmAnalyzes Datadog APM traces and service performance for troubleshooting and optimization
- Identifies slow or error-prone traces across distributed services
- Uses Datadog APM APIs to search, retrieve trace details, and list services
- Applies query filters and time ranges to isolate performance issues
- Returns structured trace data, service summaries, and actionable insights
SKILL.md
.github/skills/datadog-apmView on GitHub ↗
--- name: datadog-apm description: "Analyze distributed traces and service performance in Datadog APM." version: 1.0.0 license: Apache-2.0 author: netclaw tags: [] --- # Datadog APM Skill Analyze distributed traces and service performance in Datadog APM. ## Tools | Tool | Description | |------|-------------| | `search_traces` | Search traces with query syntax and filters | | `get_trace_details` | Get detailed span information for a trace | | `list_services` | List all instrumented services | | `get_service_summary` | Get service health and performance summary | ## Example Queries ``` Search for slow traces in the network service → search_traces(query="service:network-api @duration:>1s", time_range="1h") Find error traces → search_traces(query="status:error", time_range="30m") Get details of a specific trace → get_trace_details(trace_id="abc123def456") List all services → list_services() Get service performance summary → get_service_summary(service_name="network-api") ``` ## Workflows ### Latency Analysis 1. Search slow traces: `search_traces(query="@duration:>500ms")` 2. Identify common patterns (resource, operation) 3. Drill into trace waterfall: `get_trace_details(trace_id="...")` 4. Find bottleneck spans 5. Compare with baseline performance ### Error Investigation 1. Search error traces: `search_traces(query="status:error service:network")` 2. Group by error type and root cause 3. Review span-level errors in trace details 4. Correlate with recent deployments or changes ### Service Health Review 1. List all services: `list_services()` 2. Get summary for critical services: `get_service_summary(service_name="...")` 3. Check error rate, latency percentiles, throughput 4. Identify degraded or unhealthy services 5. Review service dependencies map ### Root Cause Analysis 1. Start from incident or alert 2. Find related traces: `search_traces(query="trace_id:abc123")` 3. Walk span tree to identify failure point 4. Check upstream/downstream service health 5. Correlate with infrastructure metrics ## Prerequisites - `DD_API_KEY` Datadog API key - `DD_APP_KEY` Datadog application key - `DD_SITE` Datadog site (optional, defaults to datadoghq.com) ## Server This skill uses the `datadog-mcp` server via remote MCP transport. ## Notes - APM requires instrumented services sending traces to Datadog - Trace retention depends on your Datadog plan - Large trace searches may be paginated
More from automateyournetwork/netclaw
- aap-automationRed Hat Ansible Automation Platform — inventory management, job template execution, project SCM sync, ad-hoc commands, host management, Galaxy content discovery. Use when automating infrastructure with Ansible, running playbooks, managing inventories, or searching for Ansible collections and roles.
- aap-edaEvent-Driven Ansible (EDA) — activation lifecycle, rulebook management, decision environments, event stream monitoring. Use when managing event-driven automation triggers, enabling/disabling activations, or reviewing EDA rulebooks.
- aap-lintansible-lint playbook and role validation — syntax checking, best practice enforcement, project-wide analysis, rule filtering. Use when validating Ansible playbooks, checking code quality, or enforcing automation best practices before deployment.
- aci-change-deploySafe ACI policy change deployment - ServiceNow CR lifecycle, pre/post-change fault baselines, APIC policy application, automatic rollback on fault delta, and GAIT audit trail. Use when deploying ACI policy changes, creating tenants or EPGs, pushing config to APIC, or running a change window with rollback protection.
- aci-fabric-auditComprehensive Cisco ACI fabric health audit - node status, tenant/VRF/BD/EPG policy review, contract analysis, fault triage, and endpoint learning verification. Use when auditing ACI fabric health, checking for faults, reviewing tenant policies, or running pre/post-change baselines on APIC.
- arista-cvpArista CloudVision Portal (CVP) automation via REST API — device inventory, events, connectivity monitoring, tag management (4 tools). Use when managing Arista devices, checking CloudVision events, monitoring network connectivity probes, or tagging devices in CVP.
- aruba-cx-configView and manage Aruba CX switch configurations, perform ISSU upgrades, and firmware operations
- aruba-cx-interfacesMonitor Aruba CX switch interface status, LLDP neighbors, and optical transceiver health
- aruba-cx-switchingView and manage Aruba CX switch VLANs and MAC address tables for Layer 2 operations
- aruba-cx-systemDiscover Aruba CX switch system information, firmware versions, and VSF topology