ai-session-watch
$
npx mdskill add arcasilesgroup/ai-engineering/ai-session-watchProject-local instinct learning for `ai-engineering`. Two modes: passive observation during a session, and active consolidation on demand. No daemons, no background workers -- the LLM itself is the observer.
SKILL.md
.github/skills/ai-session-watchView on GitHub ↗
---
name: ai-session-watch
description: Observes session corrections, recoveries, and workflow patterns silently, then consolidates them into project-local observations on demand. Trigger for 'start observing', 'learn from this session', 'consolidate observations', 'review what was learned', 'observe this session'. Listening mode is passive; review mode extracts and writes. Not for cross-project learning; use /ai-learn instead. Not for skill rewrites; use /ai-skill-improve instead.
effort: mid
argument-hint: "[--review]"
mode: agent
tags: [meta, learning, continuous-improvement, observe]
model_tier: sonnet
mirror_family: copilot-skills
generated_by: ai-eng sync
canonical_source: .claude/skills/ai-session-watch/SKILL.md
edit_policy: generated-do-not-edit
---
# ai-session-watch
Project-local instinct learning for `ai-engineering`. Two modes: passive observation during a session, and active consolidation on demand. No daemons, no background workers -- the LLM itself is the observer.
## Quick start
```
/ai-session-watch # passive observation mode (silent until --review)
/ai-session-watch --review # consolidate observations into observations.yml
```
## Workflow
1. Observation is **always-on** via the `instinct-observe.py` hooks (PreToolUse + PostToolUse) — no manual activation, and `/ai-start` does NOT start it. Invoking bare `/ai-session-watch` is optional and only emits a one-line acknowledgement.
2. As work happens, the model passively notes corrections, error recoveries, and skill-invocation sequences.
3. Before a commit or PR, invoke `/ai-session-watch --review` to run the 5-step consolidation: extract → enrich → write → evaluate → create work items.
4. High-confidence proposals (>= 0.7, evidence >= 3) auto-generate work items via `gh issue create` or `az boards work-item create`.
## Artifact Set
| Artifact | Purpose |
| ---------------------------------------------------- | --------------------------------------------------------------- |
| `.ai-engineering/state/observation-events.ndjson` | Append-only observation stream from hooks. Retain last 30 days. |
| `.ai-engineering/observations/observations.yml` | Canonical project-local instinct store (v2 schema). |
| `.ai-engineering/observations/meta.json` | Checkpoints and thresholds for consolidation. |
| `.ai-engineering/observations/proposals.md` | Actionable proposals generated when thresholds are met. |
## Supported Families (v2)
The canonical store supports only these sections:
- `corrections` -- user corrections to AI behavior (LLM-extracted during --review)
- `recoveries` -- error recovery patterns (hook-detected + LLM-enriched)
- `workflows` -- skill invocation sequences (hook-detected + LLM-enriched)
Anything outside those families is out of scope.
## Commands
### `/ai-session-watch` (listening mode)
Enter passive observation mode for the session. Output ONLY this single line, then go silent:
> observing the session...
Do nothing else. Do not read files. Do not produce analysis. The LLM passively observes corrections, error recoveries, and workflow patterns as the session continues. Observations are consolidated only when `--review` is invoked.
### `/ai-session-watch --review` (consolidation)
5-step consolidation (extract → enrich → write → evaluate → create work items). Run this before `/ai-commit` or `/ai-pr` to capture learnings.
#### Step 1: EXTRACT
Review the current conversation for:
- **Corrections**: moments where the user corrected AI behavior, rejected an approach, or redirected a decision.
- **Recoveries**: errors encountered and how they were resolved.
- **Workflows**: skill invocation sequences and tool usage patterns.
For each observation, identify:
- `relatedSkill` -- which skill was active (e.g., `ai-code`, `ai-test`)
- `diagnostic` -- the error message or correction signal
- `skillIssue` -- what the skill got wrong or could improve
#### Step 2: ENRICH
1. Read `.ai-engineering/state/observation-events.ndjson` for hook-detected recoveries.
2. Read `.ai-engineering/state/framework-events.ndjson` for `skill_invoked` events to detect workflow patterns.
3. For each extracted observation, add semantic fields:
- `trigger` -- what condition causes this pattern (e.g., "user says 'no, do X instead'")
- `action` -- what the AI should do differently (e.g., "prefer X over Y in this context")
#### Step 3: WRITE
Upsert entries into `.ai-engineering/observations/observations.yml` using the v2 schema. Each family entry shares: `pattern`, `trigger`, `action`, `relatedSkill`, `confidence` (0.0-1.0), `evidenceCount`, `domain` (project|stack|team), `lastSeen` (ISO 8601). `corrections` and `recoveries` add `diagnostic` + `skillIssue`.
```yaml
schemaVersion: "2.0"
corrections:
- pattern: "<description>"
trigger: "<what triggers this>"
action: "<what to do>"
relatedSkill: "<skill-name>"
diagnostic: "<error or correction signal>"
skillIssue: "<what the skill got wrong>"
confidence: 0.7
evidenceCount: 3
domain: "project"
lastSeen: "2026-04-27T00:00:00Z"
recoveries:
- pattern: "<description>"
# ...same fields; trigger/action/diagnostic describe error pattern + recovery steps + error message
workflows:
- pattern: "<description>"
# ...same fields except diagnostic/skillIssue; trigger/action describe the sequence
```
Merge rules:
- If an existing entry matches the same pattern (fuzzy match on trigger + action), increment `evidenceCount` and update `lastSeen`. Match if the trigger and action describe the same behavioral pattern using different wording. When in doubt, increment the existing entry rather than creating a duplicate.
- Apply confidence scoring: `confidence_for_count(evidenceCount)` yields 0.3/0.5/0.7/0.85 at thresholds 1/2/3/5+.
- Drop entries with confidence below 0.2 (decay threshold).
- Update `.ai-engineering/observations/meta.json` with new checkpoint.
#### Step 4: EVALUATE
Cross-reference the updated instincts with project knowledge to produce actionable proposals:
1. Read `.ai-engineering/LESSONS.md` to check for already-captured patterns.
2. Read project context: `CONSTITUTION.md`, `.ai-engineering/manifest.yml`, and the target artifact (e.g., `.github/skills/ai-<skill>/SKILL.md` or `.github/agents/<agent>.agent.md`) to understand the improvement surface. If only `.ai-engineering/CONSTITUTION.md` exists, use it as a compatibility fallback.
3. Filter instincts: only those with `confidence >= 0.7` AND `evidenceCount >= 3` qualify as proposals.
4. For each qualifying instinct, check if the pattern is already captured in LESSONS.md -- if so, skip.
5. Append a new `PROP-NNN` entry to `.ai-engineering/observations/proposals.md`:
```markdown
## PROP-NNN: <title>
- **Status**: proposed
- **Source**: <family> instinct, confidence <N>, evidence <N>
- **Pattern**: <what was observed>
- **Diagnostic**: <error message or correction signal from the instinct>
- **Proposed fix**: <specific change: update SKILL.md procedure, add LESSONS.md entry, adjust manifest config, etc.>
- **Target**: LESSONS.md | SKILL.md (<which skill>) | agent.md (<which agent>) | manifest.yml | hook (<which hook>)
- **LESSONS.md cross-ref**: <"none" or the matching lesson heading if partial overlap exists>
```
Number proposals sequentially (PROP-001, PROP-002, ...). Check existing entries in proposals.md before assigning the next number.
#### Step 5: CREATE WORK ITEMS
If `.ai-engineering/manifest.yml` has a `work_items` section, create trackable work items for each new proposal. Follow the same fail-open protocol as `/ai-board sync`: never block the calling workflow.
1. Read `work_items.provider` from `.ai-engineering/manifest.yml`.
2. **Check for duplicates** before creating:
- **GitHub**: `gh issue list --label "ai-engineering,instinct" --state open --json title`
- **Azure DevOps**: `az boards query --wiql "SELECT [System.Title] FROM WorkItems WHERE [System.Tags] CONTAINS 'instinct' AND [System.State] <> 'Closed'" -o json`
- If a work item with a matching title already exists, skip and note "duplicate" in the output.
3. Create the work item:
- **GitHub** (`work_items.provider: github`): `gh issue create --title "instinct: [target] - [diagnostic]" --body "<proposal body>" --label "ai-engineering,instinct"`
- **Azure DevOps** (`work_items.provider: azure_devops`): `az boards work-item create --title "instinct: [target] - [diagnostic]" --description "<proposal body>" --type "Task"`
4. Update the proposal entry in `.ai-engineering/observations/proposals.md`: set `Status` to `work-item-created` and append `- **Work item**: <ref>` (e.g., `#45` or `AB#100`).
5. If no `work_items` section in manifest, skip silently.
6. Fail-open: if CLI is not authenticated, project is not found, or command fails -- log a warning with remediation hint (e.g., `gh auth login`) but do not block the review.
## Review-Mode Output
Structured summary: observations extracted (count per family), entries upserted (new vs. updated), proposals generated (count, titles), work items created (count, links). If no meaningful observations: "No consolidation needed -- session had no corrections, recoveries, or notable workflow patterns."
## Boundaries
- Project-local only. No global instinct scope.
- One canonical `observations.yml`, not one file per instinct.
- Never store transcripts, prompts, responses, or raw tool payloads.
- Do not create instincts outside `.ai-engineering/observations/`.
- Do not invent unsupported pattern types beyond corrections/recoveries/workflows.
- Do not claim the system supports promotion, evolution, or global libraries.
## Examples
### Example 1 — start a passive observation session
User: "begin observing this session for pattern learning"
```
/ai-session-watch
```
Outputs `observing the session...` and goes silent. The model passively notes corrections, error recoveries, and workflow sequences as the session continues. No further output until `--review`.
### Example 2 — consolidate before committing
User: "review what was learned in this session before I commit"
```
/ai-session-watch --review
```
Runs the 5-step consolidation: extract observations from the conversation, enrich with hook events, upsert into `observations.yml`, evaluate against LESSONS.md, and create work items for high-confidence proposals.
## Integration
Called by: user directly at session start. Calls: `gh issue create` / `az boards work-item create` (Step 5 work-item creation). Reads: `decision-store.json`, `LESSONS.md`, `observations.yml`, `proposals.md`. See also: `/ai-learn` (cross-session retro), `/ai-skill-improve` (acts on high-confidence proposals).
$ARGUMENTS
More from arcasilesgroup/ai-engineering
- ai-adviseProactive governance advisor — checks standards, decisions, and quality trends during development. Always advisory, NEVER blocks. Three modes: `advise` (post-edit), `gate` (pre-dispatch), `drift` (on-demand decision audit). Trigger for 'governance check', 'advise on this change', 'check for drift', 'is this aligned with active decisions', 'shift-left advisory'. Not for blocking gates — use /ai-verify. Not for narrative code review — use /ai-review.
- ai-analyze-permissionsUse when Claude Code keeps asking to approve commands you have already approved, when settings.local.json has grown large, or when you want to consolidate permission grants into wildcard patterns. Trigger for 'too many permission prompts', 'clean up permissions', 'audit my settings', 'consolidate allow rules'. Claude Code only — not available in GitHub Copilot, Antigravity, or Codex.
- ai-animationDesigns motion, transitions, and micro-interactions for UI components: spring animations, gestures, easing, staggers — taste-driven detail compounding. Trigger for 'animate this', 'add transitions', 'micro-interactions for', 'gesture design', 'swipe to dismiss', 'easing for this', 'stagger the'. Not for design systems; use /ai-design instead. Not for visual art; use /ai-visual instead. Not for testing animation code; use /ai-test instead.
- ai-autopilotDelivers large multi-concern specs and backlog runs autonomously: decomposes specs into sub-specs (or normalizes work items into a backlog DAG), deep-plans with parallel agents, builds a dependency DAG, implements in waves, runs a single final quality loop with one bounded quality-remediation pass (verify+guard+review on full changeset), delivers via PR. Trigger for 'implement spec-NNN end to end', 'autopilot this', 'autonomous delivery', 'decompose and ship', 'run the backlog', 'execute these GitHub issues', 'process the sprint backlog'. Invocation is the approval gate. Not for small or single-concern tasks; use /ai-build instead. Not for ambiguous requirements; use /ai-brainstorm first.
- ai-boardOperates the project board (GitHub Projects v2 or Azure DevOps): discovers configuration after install (fields, state mappings, process templates) and syncs work-item state at lifecycle transitions. Trigger for 'set up the board', 'configure our ADO board', 'discover board fields', 'move this issue to in-review', 'update the board', 'mark as in progress', 'sync the work item state'. Two subcommands: `discover` (post-install configuration write) and `sync` (lifecycle state transitions). Auto-invoked via `sync` by /ai-brainstorm, /ai-build, and /ai-pr; fail-open. Not for backlog execution; use /ai-autopilot --backlog instead.
- ai-brainstormForces rigorous design interrogation BEFORE any code: explores approaches, surfaces ambiguity, gathers evidence, produces an approved spec that becomes the contract for /ai-plan. Trigger for 'lets add X', 'how should we handle Y', 'whats the best approach', 'I am thinking about', 'what should we build for'. Not for existing approved specs; use /ai-plan instead. Not for execution; use /ai-build instead.
- ai-branch-cleanupCleans branches safely: switches to the default branch, prunes merged and squash-merged branches, syncs to remote, sweeps stale specs, rotates `.ai-engineering/runtime/` per retention policy. Trigger for 'tidy up', 'tidy branches', 'sync to main', 'delete old branches', 'start fresh', 'rotate runtime'. Auto-invoked by /ai-pr after merge. Not for committing changes; use /ai-commit instead. Not for code-level dead-code removal; use /ai-simplify instead.
- ai-buildCanonical implementation gateway: reads approved plan.md, resolves stack from manifest, deterministic-routes each task to its adapter, dispatches the build agent in an isolated worktree, runs TDD self-validation per task, then a single final quality loop with one bounded quality-remediation pass on the full changeset before /ai-pr. Trigger for 'go', 'start building', 'execute the plan', 'implement it', 'lets do this', 'build the plan', 'resume', 'continue'. Not without an approved plan; run /ai-plan first. Not for multi-concern specs needing decomposition; use /ai-autopilot instead. Not for a single function or subcomponent; use /ai-code.
- ai-codeWrites production code that satisfies stack-context standards on the first pass: interface-first design, backward-compatibility checks, lightweight self-review. Trigger for 'implement this', 'write the code for', 'add X to Y', 'build this function', 'make this work'. Not for tests; use /ai-test instead. Not for debugging; use /ai-debug instead. Not for refactoring; use /ai-simplify instead. Not for executing an approved plan end-to-end; use /ai-build (the gateway).
- ai-commitRuns the governed commit pipeline: auto-branches from protected, stages selectively, formats and lints, scans for secrets, gates docs, composes a conventional message, pushes. Trigger for 'commit my changes', 'save my work', 'push this to remote', 'stage these files', 'ship it'. Not for opening a PR; use /ai-pr instead. Not for branch hygiene; use /ai-branch-cleanup instead.