skill-analytics
$
npx mdskill add aaronjmars/aeon/skill-analyticsGenerates weekly fleet analytics for Aeon skill performance
- Identifies top-running, failing, and silent skills across the fleet
- Leverages run logs and exit taxonomy data from skill executions
- Ranks skills by run count, success rate, and exit type distribution
- Displays results in a ranked, fleet-wide dashboard with anomaly flags
SKILL.md
.github/skills/skill-analyticsView on GitHub ↗
---
name: skill-analytics
description: Weekly fleet-level skill-run analytics — ranks skills by 7d run count, surfaces success rates, exit-taxonomy distribution, and anomaly flags (significance-gated)
var: ""
tags: [meta]
---
> **${var}** — Window in hours (default: 168 = 7 days). Pass an integer like "72" for a shorter window.
Today is ${today}. Generate a fleet-level performance view of every Aeon skill that has run in the window. **The point of this skill is to answer four questions in one report:** which skills run most, which fail most, which are silently skipping (new exit taxonomy from the autoresearch-evolution rewrites), and which scheduled skills haven't fired at all. heartbeat gives binary ok/not-ok per run; skill-health audits one skill at a time. This is the only place the operator can see the entire fleet ranked side-by-side.
## Why this exists
`heartbeat` runs three times daily and emits a per-skill ✓/✗. `skill-health` files issues for skills that breach degradation thresholds. Neither produces a ranked, fleet-wide view. The 80 autoresearch-evolution rewrites (aeon PRs #46–#136) introduced new exit taxonomies — `SKIP_UNCHANGED`, `NEW_INFO`, `SKIP_QUIET` — that classify quiet-but-correct runs separately from failures. Existing health checks treat any non-`*_OK` exit as worth attention; the analytics widget makes the actual distribution visible so a skill running mostly `SKIP_UNCHANGED` reads as healthy-quiet, not silently broken.
## Steps
### 1. Determine the window
- Default: 168 hours (7 days). If `${var}` parses as a positive integer, use that many hours instead. Cap at 720 (30 days) — anything longer slows the `gh api` paginate.
- Compute `WINDOW_HOURS=N` and `WINDOW_LABEL` (e.g. `"last 7d"` or `"last 72h"`).
### 2. Pull the run snapshot
```bash
./scripts/skill-runs --json --hours $WINDOW_HOURS > .outputs/skill-analytics-runs.json 2>/dev/null
```
If the script fails (auth, rate limit, sandbox block) or the JSON is empty:
- Log `SKILL_ANALYTICS_NO_DATA — skill-runs returned empty (gh api / sandbox block?)` to `memory/logs/${today}.md` and stop with **no notification**. A silent fleet view is correct on data-fetch failure — fall back rather than guess.
The script's JSON shape (see `scripts/skill-runs`):
```json
{
"period": {"since": "...", "until": "...", "hours": 168},
"summary": {"total": N, "succeeded": N, "failed": N, "cancelled": N, "in_progress": N},
"skills": [{"skill": "name", "total": N, "success": N, "failure": N, "cancelled": N, "in_progress": N, "last_run": "...", "last_conclusion": "..."}],
"anomalies": {"duplicates": [...], "failing": [...]}
}
```
### 3. Cross-reference with cron schedule
Read `aeon.yml` and build `SCHEDULED_SKILLS`: dict `{skill_name -> {enabled: bool, schedule: str}}` for every entry under `skills:`. Treat `schedule: "workflow_dispatch"` and `schedule: "reactive"` as exempt from the "no runs in window" anomaly — those are dispatched on demand, not by cron.
For every skill in `SCHEDULED_SKILLS` where `enabled: true` AND schedule is a valid cron expression AND the skill is **not** present in the snapshot's `skills` array, mark `silent_scheduled: true` (zero runs in window despite an active schedule).
### 4. Cross-reference with cron-state.json
Load `memory/cron-state.json` if present (missing → empty dict, not failure). For each skill in the snapshot, attach:
- `consecutive_failures` (0 if missing)
- `last_status` (`"unknown"` if missing)
Used to compute the consecutive-failure anomaly without a second `gh api` round-trip.
### 5. Mine exit taxonomy from logs
For each daily log file `memory/logs/YYYY-MM-DD.md` whose date falls in the window, scan for these markers (one match per skill section):
- `_OK` → success (excluding `_OK_SILENT`)
- `_OK_SILENT` / `_QUIET` / `SKIP_QUIET` → quiet-success
- `SKIP_UNCHANGED` → skip-unchanged (autoresearch-evolution exit)
- `NEW_INFO` → new-info (autoresearch-evolution exit)
- `_SKIP*` (other) → skip-other
- `_ERROR` / `_FAILED` → error
- `_PARTIAL` → partial
- (no match) → uncategorized
Build `EXIT_DIST[skill]` = `{ok: N, quiet: N, skip_unchanged: N, new_info: N, skip_other: N, error: N, partial: N, uncategorized: N}`. The dominant bucket per skill is the one with the largest count; ties broken in the order listed above. If a skill has no log markers in the window, dominant bucket is `"uncategorized"`.
This step is best-effort — the markers are regex-grepped from human-written logs, not parsed from a contract. A miss-rate of 10–20% is expected and acceptable; the GitHub Actions success/failure counts from step 2 remain the ground truth for pass/fail. The taxonomy distribution is a secondary signal.
### 6. Anomaly classification
For each skill in the snapshot OR `silent_scheduled`, assign **at most one** anomaly flag, first match wins:
| Flag | Trigger |
|---|---|
| `🔴 SILENT` | `silent_scheduled: true` (enabled cron skill, zero runs in window) |
| `🔴 ALL_FAIL` | `total >= 2` AND `failure == total` |
| `🟠 CONSECUTIVE_FAILURES` | `consecutive_failures >= 3` (from cron-state) |
| `🟠 LOW_SUCCESS` | `total >= 3` AND `success / total < 0.80` |
| `🟡 ALL_SKIP` | `total >= 3` AND `EXIT_DIST.ok + EXIT_DIST.quiet + EXIT_DIST.new_info == 0` AND `EXIT_DIST.skip_unchanged + EXIT_DIST.skip_other > 0` (every run skipped — possibly correct, possibly stuck) |
| `🟡 DUPLICATE_RUNS` | `total > 2 × expected_runs(schedule, window)` (more runs than the cron should produce — manual reruns or scheduler glitch) |
`expected_runs(schedule, window)` is a coarse estimate — for a cron `"0 H * * *"` over 7 days, expect 7; for `"0 H,H,H * * *"`, expect 21; for weekly `"0 H * * D"`, expect 1. If the schedule string is unparseable, skip the duplicate check for that skill (do not flag false positives).
A skill with no flag is considered HEALTHY for analytics purposes.
### 7. Compute summary
```
total_runs: sum of every skill's total
distinct_skills: count of skills with total >= 1
overall_success_pct: snapshot.summary.succeeded / (succeeded + failed) × 100 (cancelled + in_progress excluded)
anomaly_count: count of skills with any flag in step 6
silent_scheduled_count: count of SILENT flags
exit_dominant: top 3 dominant exit buckets across the fleet, e.g. "ok (42), skip_unchanged (18), error (3)"
```
### 8. Build the verdict line
Pick the strongest single claim, in priority:
1. Any `🔴 SILENT` exists → `"${N} scheduled skill(s) didn't run this window — ${first_skill}"`
2. Any `🔴 ALL_FAIL` exists → `"${first_skill} failed every run (${N}/${N}) — investigate"`
3. Any `🟠 CONSECUTIVE_FAILURES` exists → `"${first_skill} on ${N}-run failure streak"`
4. Any `🟠 LOW_SUCCESS` exists → `"${first_skill} ${pct}% success over ${total} runs — degraded"`
5. Any `🟡 ALL_SKIP` exists → `"${N} skill(s) only emitting skip-class exits this window — verify intent"`
6. Otherwise → `"All ${distinct_skills} active skills healthy — ${overall_success_pct}% success across ${total_runs} runs"`
### 9. Significance gate
**Notify only if `anomaly_count >= 1`.** Silent run = correct (no anomalies in fleet) = no notification. Following the autoresearch-evolution / fork-skill-digest pattern: noisy skills break trust faster than missing pings.
If gate says skip, still write the article and JSON spec, and log `SKILL_ANALYTICS_QUIET` (no anomalies). The dashboard widget refreshes regardless; only the push notification is gated.
### 10. Write the article
Path: `articles/skill-analytics-${today}.md`. Overwrite if it exists (idempotent same-day reruns).
```markdown
# Skill Analytics — ${today}
**Verdict:** ${verdict_line}
*Window: ${WINDOW_LABEL} · ${total_runs} runs across ${distinct_skills} skills · ${overall_success_pct}% success · ${anomaly_count} anomalies*
## Anomalies
| Flag | Skill | Detail | Action |
|------|-------|--------|--------|
| 🔴 SILENT | name | scheduled `<cron>` but zero runs in window | check workflow / scheduler |
| 🔴 ALL_FAIL | name | N/N failed | investigate root cause |
| 🟠 CONSECUTIVE_FAILURES | name | N-run streak (last_error: "...") | see skill-health for filed issue |
| 🟠 LOW_SUCCESS | name | N% over M runs | review failures |
| 🟡 ALL_SKIP | name | M runs, all skip-class | confirm SKIP_UNCHANGED is the intent |
| 🟡 DUPLICATE_RUNS | name | M runs, expected ~K | check for manual reruns |
(If `anomaly_count == 0`: write `No anomalies — fleet healthy across ${distinct_skills} skills.`)
## Top runners (by run count)
| # | Skill | Runs | Success | Last status | Dominant exit |
|---|-------|------|---------|-------------|---------------|
| 1 | name | N | XX% | success | ok |
| 2 | name | N | XX% | success | skip_unchanged |
...
(Top 15 by total runs desc. If fewer than 15 active skills, list all.)
## Failure rate (sorted, ≥1 failure)
| Skill | Runs | Failures | Success rate | Last conclusion |
|-------|------|----------|--------------|-----------------|
(All skills with `failure >= 1`, sorted by `failure / total` desc. If none: "Zero failures across ${distinct_skills} skills this window.")
## Exit taxonomy distribution
| Bucket | Count | % | Top skills |
|--------|-------|---|------------|
| ok | N | XX% | a, b, c |
| skip_unchanged | N | XX% | d, e |
| new_info | N | XX% | f |
| quiet | N | XX% | g |
| error | N | XX% | h |
| partial | N | XX% | |
| uncategorized | N | XX% | |
(Sourced from `memory/logs/*.md` — best-effort regex grep, see Step 5. Cell-aligns to summary cells above where available.)
## Silent scheduled skills (enabled, zero runs)
${list of {skill, schedule} pairs OR "none — every enabled cron skill ran at least once."}
## Source status
- skill-runs JSON: ${ok|empty|fetch_error}
- Window: ${WINDOW_HOURS}h (${period.since} → ${period.until})
- aeon.yml: ${ok|missing}
- cron-state.json: ${ok|missing — first run for this fork?}
- Daily logs scanned: ${N_LOG_FILES}/${expected_log_files} for exit taxonomy
---
*Companion to `skill-health` (per-skill issue filing) and `heartbeat` (per-run pulse). Fleet-wide observability is the gap this skill closes. Methodology: GitHub Actions run history is ground truth for pass/fail; daily-log markers are best-effort secondary signal for exit taxonomy.*
```
### 11. Write the dashboard JSON spec
Path: `dashboard/outputs/skill-analytics.json`. Use the catalog components (Card / Stack / Heading / Text / Badge / Table).
```json
{
"version": "1",
"generated_at": "${ISO timestamp}",
"skill": "skill-analytics",
"title": "Skill Analytics — ${today}",
"spec": {
"type": "Stack",
"props": {"direction": "vertical", "gap": "md"},
"children": [
{"type": "Heading", "props": {"level": 2, "children": "Skill Analytics — ${today}"}},
{"type": "Text", "props": {"variant": "muted", "children": "${verdict_line}"}},
{"type": "Grid", "props": {"columns": 4, "gap": "sm"}, "children": [
{"type": "Card", "props": {"children": [
{"type": "Text", "props": {"variant": "muted", "children": "Total runs"}},
{"type": "Heading", "props": {"level": 3, "children": "${total_runs}"}}
]}},
{"type": "Card", "props": {"children": [
{"type": "Text", "props": {"variant": "muted", "children": "Active skills"}},
{"type": "Heading", "props": {"level": 3, "children": "${distinct_skills}"}}
]}},
{"type": "Card", "props": {"children": [
{"type": "Text", "props": {"variant": "muted", "children": "Success rate"}},
{"type": "Heading", "props": {"level": 3, "children": "${overall_success_pct}%"}}
]}},
{"type": "Card", "props": {"children": [
{"type": "Text", "props": {"variant": "muted", "children": "Anomalies"}},
{"type": "Heading", "props": {"level": 3, "children": "${anomaly_count}"}}
]}}
]},
{"type": "Heading", "props": {"level": 3, "children": "Top runners"}},
{"type": "Table", "props": {
"columns": [
{"key": "rank", "header": "#"},
{"key": "skill", "header": "Skill"},
{"key": "runs", "header": "Runs"},
{"key": "success", "header": "Success"},
{"key": "exit", "header": "Dominant exit"}
],
"rows": [
{"rank": "1", "skill": "name", "runs": "N", "success": "XX%", "exit": "ok"}
]
}}
]
}
}
```
If `anomaly_count >= 1`, prepend an `Alert` block before the verdict:
```json
{"type": "Alert", "props": {"variant": "destructive", "children": "${anomaly_count} anomaly flag(s) raised — see Anomalies section"}}
```
If the file write fails (filesystem read-only, missing directory), log a warning but do not abort — the article is the canonical artifact, the JSON spec is a dashboard convenience.
### 12. Send notification (only if gate from step 9 passed)
Via `./notify`:
```
*Skill Analytics — ${today}*
${verdict_line}
Window: ${WINDOW_LABEL} · ${total_runs} runs · ${distinct_skills} skills · ${overall_success_pct}% success
Anomalies: ${anomaly_count}
${If 🔴 flags (cap top 3):}
🔴 Critical:
- ${skill} — ${flag}: ${detail}
${If 🟠 flags (cap top 3):}
🟠 Degraded:
- ${skill} — ${flag}: ${detail}
${If 🟡 flags (top 3, only if no 🔴/🟠 already filled the slots):}
🟡 Watch:
- ${skill} — ${flag}: ${detail}
Top by runs: ${top_3_skills_by_run_count_with_counts}
Full: articles/skill-analytics-${today}.md
```
Cap the message body at ~3500 chars (Telegram safe limit). Drop the "Top by runs" line first if exceeded; flags are higher signal.
### 13. Log to `memory/logs/${today}.md`
```
## Skill Analytics
- **Skill**: skill-analytics
- **Window**: ${WINDOW_LABEL} (${WINDOW_HOURS}h)
- **Total runs**: ${total_runs} across ${distinct_skills} skills
- **Overall success rate**: ${overall_success_pct}%
- **Anomalies**: ${anomaly_count} (🔴 ${red_count}, 🟠 ${orange_count}, 🟡 ${yellow_count})
- **Silent scheduled**: ${silent_scheduled_count} skills (${comma list capped at 5})
- **Top runner**: ${top_skill} (${top_runs} runs)
- **Exit dominant**: ${exit_dominant_summary}
- **Verdict**: ${verdict_line}
- **Article**: articles/skill-analytics-${today}.md
- **Dashboard**: dashboard/outputs/skill-analytics.json
- **Notification sent**: ${yes|no — quiet (no anomalies)}
- **Status**: SKILL_ANALYTICS_OK | SKILL_ANALYTICS_QUIET | SKILL_ANALYTICS_NO_DATA
```
## Exit taxonomy
| Status | Meaning | Notify? |
|--------|---------|---------|
| `SKILL_ANALYTICS_OK` | snapshot fetched, ≥1 anomaly flagged | Yes |
| `SKILL_ANALYTICS_QUIET` | snapshot fetched, zero anomalies | No (article + JSON written, log only) |
| `SKILL_ANALYTICS_NO_DATA` | skill-runs returned empty / fetch failed | No (log only, no article overwrite) |
## Sandbox note
`./scripts/skill-runs` uses `gh api` internally — auth comes from `GITHUB_TOKEN`, no curl/env-var-in-header issue. No outbound HTTP from this skill itself. If `gh api` is rate-limited or the runner's network is degraded, the script exits non-zero; this skill catches that and falls through to `SKILL_ANALYTICS_NO_DATA` rather than emitting a partial fleet view that would mislead.
## Constraints
- **Significance-gated.** A clean fleet must produce zero notifications. Article and JSON spec still write so the dashboard reflects the latest state, but `./notify` is silent.
- **Never invent runs.** If `skill-runs` returns empty, exit `SKILL_ANALYTICS_NO_DATA` — do not synthesise data from cron-state alone (cron-state's view is per-skill, not chronologically ordered, and would produce a misleading "top runners" table).
- **Best-effort exit-taxonomy parsing.** Log markers are human-written; expect a 10–20% miss rate. Do not block the article on parse failures — drop the affected skill into `uncategorized` and continue.
- **Idempotent.** Same-day reruns overwrite the article and JSON spec. The log entry is appended (one block per run, lets the operator see analytic drift across reruns).
- **No issue filing.** This skill does not write to `memory/issues/` — that contract belongs to `skill-health`. Anomalies surface here as flags; persistence and resolution live in skill-health's domain.
- **Respect workflow_dispatch / reactive.** Skills with non-cron schedules cannot be SILENT — they fire only on demand. Excluding them from the silent-scheduled check prevents permanent false positives.
More from aaronjmars/aeon
- [REPLACE: SKILL_NAME]Daily price and volume tracker for [REPLACE: TOKEN_SYMBOL] with anomaly alerts above [REPLACE: ALERT_THRESHOLD_PCT]% movement
- Action Converter5 concrete real-life actions for today, leverage-scored against open loops with specificity and anti-fluff gates
- Agent BuzzCurated AI-agent tweets, clustered into narratives with insight summaries
- agent-displacementWeekly tracker of AI agent substitution signals — which roles, companies, and industries show real headcount displacement. Named roles + real deployments only.
- AI Framework WatchWeekly competitive-intelligence digest on the AI agent framework space — momentum, releases, breaking changes across a curated watchlist
- AIXBT PulseCross-domain market pulse from AIXBT's free grounding endpoint — crypto, macro, tradfi, geopolitics. Refreshes taxonomy references (clusters, chains) as a bonus.
- api-health-probeDaily pre-batch API provider health check — detects credit exhaustion or auth failure for every configured provider key before the morning batch runs, giving the operator a window to act before skills degrade
- Approval AuditList a wallet's live ERC-20 token approvals on Base and flag unlimited / risky spender grants. Keyless via Base RPC (eth_getLogs + eth_call) — no explorer key needed.
- article-queueWeekly article idea synthesizer — ranks signals from topic-momentum, beat-tracker, and narrative-tracker into a prioritized queue the article skill reads on next run
- atrium-catalog-watcherWeekly diff of the Atrium marketplace catalog at https://atriumhermes.tech/.well-known/skills/index.json against the prior snapshot — surfaces newly-published skills, removed skills, and updated descriptions. Supply-side complement to sparkleware-catalog (curated skill-packs.json registry) and skill-update-check (version drift of installed skills).