qzcli
$
npx mdskill add wanshuiyin/Auto-claude-code-research-in-sleep/qzcliA kubectl/docker-style CLI for managing GPU compute jobs on the Qizhi (启智) platform.
SKILL.md
.github/skills/qzcliView on GitHub ↗
---
name: qzcli
description: Manage GPU compute jobs on the Qizhi (启智) platform using qzcli — a kubectl-style CLI tool. Use when user says "qzcli", "启智平台", "submit job", "stop job", "查计算组", "avail", "list jobs", "batch submit", or needs to manage distributed training jobs on a Qizhi instance.
argument-hint: [login|avail|list|create|stop <job-id>|batch|status|watch]
allowed-tools: Bash(*), Read, Write
---
# qzcli — 启智平台任务管理
A kubectl/docker-style CLI for managing GPU compute jobs on the Qizhi (启智) platform.
**GitHub:** [tianyilt/qzcli_tool](https://github.com/tianyilt/qzcli_tool)
## Installation
```bash
pip install rich requests prompt_toolkit mcp
git clone https://github.com/tianyilt/qzcli_tool
cd qzcli_tool && pip install -e .
```
### MCP Integration (optional)
To use qzcli as an MCP tool directly from Claude Code or Codex:
```bash
# Claude Code
claude mcp add qzcli -- qzcli-mcp
# Codex
codex mcp add qzcli -- qzcli-mcp
```
---
## Configuration
Credentials are read in this priority order:
`CLI args > --password-stdin > env vars > QZCLI_ENV_FILE (.env) > ~/.qzcli/config.json > interactive input`
```bash
# Option A: env file (recommended)
mkdir -p ~/.qzcli
cat > ~/.qzcli/.env <<'EOF'
QZCLI_USERNAME="your_username"
QZCLI_PASSWORD="your_password"
EOF
# Option B: environment variables
export QZCLI_USERNAME="your_username"
export QZCLI_PASSWORD="your_password"
export QZCLI_API_URL="https://qz.yourorg.edu.cn"
```
Config files are stored in `~/.qzcli/`: `config.json`, `.cookie`, `resources.json`, `jobs.json`.
---
## Quick Start
```bash
# 1. Login
qzcli login
# 2. Discover and cache workspaces/compute groups (run once, re-run after joining new workspaces)
qzcli res -u
# 3. Check available nodes
qzcli avail
# 4. List running jobs
qzcli ls -c -r
```
---
## Authentication
```bash
# Interactive login
qzcli login
# With credentials
qzcli login -u YOUR_USERNAME -p 'YOUR_PASSWORD'
# Read password from stdin (for scripts)
echo 'YOUR_PASSWORD' | qzcli login -u YOUR_USERNAME --password-stdin
# Check current cookie
qzcli cookie --show
# Clear cookie
qzcli cookie --clear
```
**Note:** `qzcli avail` auto-refreshes the cookie if it expires and credentials are configured.
---
## Resource Discovery
```bash
# List cached workspaces
qzcli res --list
# Refresh all workspace resource cache (run this first!)
qzcli res -u
# Refresh a specific workspace
qzcli res -w MY_WORKSPACE -u
# Set a human-readable alias for a workspace
qzcli res -w ws-xxxxxxxx --name "My Workspace"
```
---
## Check Available Nodes
```bash
# All workspaces
qzcli avail
# Including low-priority task nodes (slower but more accurate)
qzcli avail --lp
# Specific workspace
qzcli avail -w MY_WORKSPACE
# Find compute groups with N free nodes
qzcli avail -n 4
# Export IDs for scripting
qzcli avail -n 4 -e
# Show idle node names
qzcli avail -w MY_WORKSPACE -v
```
---
## Job Submission
### Interactive (recommended for first-time use)
```bash
# Full interactive selection: workspace → project → compute group → spec
qzcli create -i
# Interactive for a specific workspace only
qzcli create -i -w "My Workspace"
```
The TUI shows GPU type, availability, and spec status at each level. Press `Enter/→` to go deeper, `←` to go back.
### Non-interactive
```bash
# Using names (resolved from qzcli res cache)
qzcli create \
--name "my-training-job" \
--command "bash /path/to/train.sh" \
--workspace "My Workspace" \
--compute-group "My Compute Group" \
--image YOUR_REGISTRY/team/image:tag \
--instances 4 \
--priority 10
# Using IDs directly
qzcli create \
--name "my-job" \
--command "bash /path/to/train.sh" \
--workspace ws-YOUR_WORKSPACE_ID \
--compute-group lcg-YOUR_LCG_ID \
--spec YOUR_SPEC_ID \
--image YOUR_REGISTRY/team/image:tag \
--instances 4
```
**Key parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--name` / `-n` | required | Job name |
| `--command` / `-c` | required | Command to run |
| `--workspace` / `-w` | | Workspace name or ID (`ws-...`) |
| `--compute-group` / `-g` | auto | Compute group name or ID (`lcg-...`) |
| `--spec` / `-s` | auto | Resource spec ID |
| `--image` / `-m` | | Docker image |
| `--instances` | 1 | Number of instances |
| `--shm` | 1200 | Shared memory (GiB) |
| `--priority` | 10 | Priority (1–10) |
| `--dry-run` | | Preview only, don't submit |
| `--json` | | JSON output for scripting |
```bash
# Preview before submitting
qzcli create --name test --command "echo hi" --workspace "My Workspace" \
--image YOUR_IMAGE --dry-run
```
### Env-var passthrough (for existing submission scripts)
```bash
# Pass vars directly — do NOT use "export VAR; bash script.sh"
WORKSPACE_ID="ws-YOUR_WORKSPACE_ID" \
LCG_ID="lcg-YOUR_LCG_ID" \
SPEC_ID="YOUR_SPEC_ID" \
CHECKPOINT_DIR="/path/to/checkpoint" \
bash YOUR_SUBMIT_SCRIPT.sh
```
### HPC / CPU jobs (Slurm)
```bash
qzcli hpc \
--name "my-cpu-job" \
--workspace ws-YOUR_WORKSPACE_ID \
--compute-group lcg-YOUR_LCG_ID \
--predef-quota-id YOUR_QUOTA_ID \
--cpu 55 --mem-gi 300 --instances 30 \
--image YOUR_REGISTRY/team/cpu-image:tag \
--entrypoint "cd /path/to/dir && bash run.sh"
```
---
## Batch Submission
```bash
# Submit from config file
qzcli batch batch_config.json --delay 3
# Preview all jobs
qzcli batch batch_config.json --dry-run
# Continue on error
qzcli batch batch_config.json --continue-on-error
```
**Config format** (`batch_config.json`):
```json
{
"defaults": {
"workspace": "ws-YOUR_WORKSPACE_ID",
"compute_group": "lcg-YOUR_LCG_ID",
"spec": "YOUR_SPEC_ID",
"image": "YOUR_REGISTRY/team/image:tag",
"instances": 4,
"priority": 10
},
"matrix": {
"checkpoint": ["/path/to/ckpt1", "/path/to/ckpt2"],
"step": [50000, 100000]
},
"name_template": "eval-{checkpoint_basename}-step{step}",
"command_template": "bash eval.sh --checkpoint {checkpoint} --step {step}"
}
```
Matrix keys are Cartesian-producted (2×2 = 4 jobs above). Use `{key_basename}` for path basenames.
### Shell loop (alternative)
```bash
for step in 040000 050000 060000; do
qzcli create \
--name "eval-step${step}" \
--command "bash eval.sh --step $step" \
--workspace "My Workspace" \
--compute-group "My Compute Group" \
--instances 4
sleep 3
done
```
---
## Job Management
```bash
# List jobs
qzcli ls -c -w MY_WORKSPACE # specific workspace
qzcli ls -c --all-ws # all workspaces
qzcli ls -c -w MY_WORKSPACE -r # running only
qzcli ls -c -w MY_WORKSPACE -n 50 # show 50
# Stop a job
qzcli stop JOB_ID
# Job status / details
qzcli status JOB_ID
# Watch all running jobs (refresh every 10s)
qzcli watch -i 10
# Workspace view with GPU utilization
qzcli ws
qzcli ws -a # all projects
qzcli ws -p "My Project"
```
---
## Troubleshooting
| Problem | Cause | Fix |
|---------|-------|-----|
| Cookie expired | Session gap | Re-run `qzcli login` |
| `未找到名称为 'xxx' 的工作空间` | Stale cache | Run `qzcli res -u` |
| No resources in `create -i` | Cache empty | Run `qzcli login && qzcli res -u` |
| `qzcli-mcp` not found | Not installed | `cd qzcli_tool && pip install -e .` |
| Spec not in workspace | ID mismatch | Match spec ID to the correct workspace |
| Silent job failure | Script `sys.exit(0)` | Check job logs directly |
| zsh glob errors | Remote shell is zsh | Wrap commands in `bash -c` or use Python |
More from wanshuiyin/Auto-claude-code-research-in-sleep
- ablation-plannerUse when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements.
- alphaxivQuick single-paper lookup via AlphaXiv LLM-optimized summaries with tiered source fallback. Use when user says "explain this paper", "summarize paper", pastes an arXiv/AlphaXiv URL, or provides a bare arXiv ID for quick understanding - not for broad literature search.
- analyze-resultsAnalyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says "analyze results", "compare", or needs to interpret experimental data.
- auto-paper-improvement-loopAutonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says \"改论文\", \"improve paper\", \"论文润色循环\", \"auto improve\", or wants to iteratively polish a generated paper.
- auto-review-loopAutonomous multi-round research review loop. Repeatedly reviews via external reviewer backend (Codex or manual), implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.
- auto-review-loop-llmAutonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
- auto-review-loop-minimaxAutonomous multi-round research review loop using MiniMax API. Use when you want to use MiniMax instead of Codex MCP for external review. Trigger with "auto review loop minimax" or "minimax review".
- citation-auditZero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says \"审查引用\", \"check citations\", \"citation audit\", \"verify references\", \"引用核对\", or before submission to ensure bibliography integrity.
- claims-draftingDraft patent claims for an invention. Use when user says \"撰写权利要求\", \"draft claims\", \"写权利要求书\", \"claim drafting\", or wants to create patent claims. The core skill of the patent pipeline.
- comm-lit-review-claude-singleCommunications-domain literature review with Claude-style knowledge-base-first retrieval. Use when the task is about communications, wireless, networking, satellite/NTN, Wi-Fi, cellular, transport protocols, congestion control, routing, scheduling, MAC/PHY, rate adaptation, channel estimation, beamforming, or communication-system research and the user wants papers, related work, a survey, or a landscape summary. Search Zotero, Obsidian, and local paper folders first when available, then search IEEE Xplore, ScienceDirect, ACM Digital Library, and broader web in that order.