training-check
$
npx mdskill add wanshuiyin/Auto-claude-code-research-in-sleep/training-checkYou are now in **interactive watch** / 交互式训练监控模式.
SKILL.md
.github/skills/training-checkView on GitHub ↗
--- name: training-check description: "Interactively monitor training metrics from the current Codex session, periodically checking WandB or fallback logs for NaN, divergence, plateaus, and broken runs." argument-hint: [wandb-run-or-monitoring-brief] allowed-tools: Bash(*), Read, Write, Edit, Grep, Glob --- # Training Check You are now in **interactive watch** / 交互式训练监控模式. Keep the current session open and report directly in the current terminal. The user is watching this terminal for updates. By default, run a training health check every 30 minutes, output a concise but complete analysis report after each check, state the next check time, then continue monitoring. This skill checks training **quality**, not basic process health. Process health checks such as whether a tmux session exists or whether the GPU is idle can be handled by watchdog-style tooling; this skill focuses on whether the run is still worth continuing. ## Inputs To Establish First Before the first check, identify or ask for the minimum monitoring context: - WandB run path or URL, if available. - Fallback log path, SSH command, or local command for reading recent training logs. - Training target, expected baseline, and key metrics that define success. - How the training was launched, so it can be stopped if needed. - Project notes path for recording decisions and evidence. If a source is unavailable, say so clearly and continue with the available source. If both WandB and fallback logs are unreachable, report the connectivity issue, classify the round as `WAIT`, and check again later. Do not infer that training is bad only because data is unreachable. ## Per-Round Check Every round, read WandB first when configured. If WandB is unreachable, read the fallback logs. Inspect at least: - Training loss trend over recent checkpoints or steps. - Eval metrics and whether they improve, flatten, or degrade against baseline. - NaN or Inf in loss, gradients, activations, or logged metrics. - Sudden loss spikes, divergence, or repeated failed evaluations. - Learning rate schedule behavior. - Gradient norm, if logged. - Plateau patterns that suggest the run is no longer useful. Output one report in the current terminal with this structure: ```text ## Training Check - <local timestamp> - Data source: wandb_ok | log_fallback | unreachable - Run: <wandb run or training identifier> - Recent metrics: <loss/eval/lr/grad summary> - Anomalies: <NaN/Inf/spike/divergence/plateau findings> - Evidence: <WandB URL, log lines, metric values, or files inspected> - Decision: CONTINUE | WAIT | STOP - Reason: <why this decision is justified> - Next check: <local timestamp, normally 30 minutes later unless ending> ``` Use the decisions as follows: | Decision | Meaning | Action | |----------|---------|--------| | `CONTINUE` | Run looks healthy enough to keep training. | Keep monitoring and check again in 30 minutes. | | `WAIT` | Evidence is inconclusive, noisy, too early, or temporarily unreachable. | Do not stop training; keep monitoring and check again later. | | `STOP` | Training is clearly problematic or no longer worth continuing. | Stop the training task, save evidence, write notes, output final summary, and end monitoring. | ## Stop Behavior When the decision is `STOP`: - Stop the training task. - If the context contains `stop_command`, run `stop_command` first. - If no `stop_command` is available, choose the appropriate stop action from how the training was launched, such as stopping the relevant tmux session, local process, remote process, scheduler job, or notebook job. - Save evidence: WandB URL, key metrics, relevant log snippets, files inspected, and the reason for stopping. - Append a project note for debugging and future analysis. - Output `FINAL_SUMMARY` in the terminal. - End the interactive monitoring loop. Never stop on the first sign of ordinary metric noise. Look for sustained trends, hard failures, or clear divergence. Always preserve enough evidence for a later agent or human to understand why the run was stopped. ## Interactive Loop Guidance - The normal interval is 30 minutes. - If a round is `CONTINUE`, announce the next check time and wait until then. - If a round is `WAIT`, explain what evidence is missing or noisy and check again later. Use a shorter interval only when the run looks suspicious but not yet stop-worthy. - If an anomaly recovers, say so explicitly and continue monitoring. - Keep the user-facing report short enough to read in a terminal, but include concrete metric values and evidence paths.
More from wanshuiyin/Auto-claude-code-research-in-sleep
- ablation-plannerUse when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements.
- alphaxivQuick single-paper lookup via AlphaXiv LLM-optimized summaries with tiered source fallback. Use when user says "explain this paper", "summarize paper", pastes an arXiv/AlphaXiv URL, or provides a bare arXiv ID for quick understanding - not for broad literature search.
- analyze-resultsAnalyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says "analyze results", "compare", or needs to interpret experimental data.
- auto-paper-improvement-loopAutonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says \"改论文\", \"improve paper\", \"论文润色循环\", \"auto improve\", or wants to iteratively polish a generated paper.
- auto-review-loopAutonomous multi-round research review loop. Repeatedly reviews via external reviewer backend (Codex or manual), implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.
- auto-review-loop-llmAutonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
- auto-review-loop-minimaxAutonomous multi-round research review loop using MiniMax API. Use when you want to use MiniMax instead of Codex MCP for external review. Trigger with "auto review loop minimax" or "minimax review".
- citation-auditZero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says \"审查引用\", \"check citations\", \"citation audit\", \"verify references\", \"引用核对\", or before submission to ensure bibliography integrity.
- claims-draftingDraft patent claims for an invention. Use when user says \"撰写权利要求\", \"draft claims\", \"写权利要求书\", \"claim drafting\", or wants to create patent claims. The core skill of the patent pipeline.
- comm-lit-review-claude-singleCommunications-domain literature review with Claude-style knowledge-base-first retrieval. Use when the task is about communications, wireless, networking, satellite/NTN, Wi-Fi, cellular, transport protocols, congestion control, routing, scheduling, MAC/PHY, rate adaptation, channel estimation, beamforming, or communication-system research and the user wants papers, related work, a survey, or a landscape summary. Search Zotero, Obsidian, and local paper folders first when available, then search IEEE Xplore, ScienceDirect, ACM Digital Library, and broader web in that order.