examples-auto-run

$npx mdskill add openai/openai-agents-python/examples-auto-run

Execute Python examples automatically with logging and rerun support.

  • Enables agents to run example scripts without manual intervention.
  • Integrates with litellm, sqlalchemy, redis, and temporal services.
  • Uses auto-input mode to approve actions and generate failure lists.
  • Delivers logs and status updates through dedicated shell helpers.
SKILL.md
.github/skills/examples-auto-runView on GitHub ↗
---
name: examples-auto-run
description: Run python examples in auto mode with logging, rerun helpers, and background control.
---

# examples-auto-run

## What it does

- Runs `uv run examples/run_examples.py` with:
  - Optional dependency extras enabled by default:
    `litellm`, `any-llm`, `sqlalchemy`, `redis`, `blaxel`, `modal`, `runloop`, and `temporal`.
  - `EXAMPLES_INTERACTIVE_MODE=auto` (auto-input/auto-approve).
  - Per-example logs under `.tmp/examples-start-logs/`.
  - Main summary log path passed via `--main-log` (also under `.tmp/examples-start-logs/`).
  - Generates a rerun list of failures at `.tmp/examples-rerun.txt` when `--write-rerun` is set.
- Provides start/stop/status/logs/tail/collect/rerun helpers via `run.sh`.
- Background option keeps the process running with a pidfile; `stop` cleans it up.

## Usage

```bash
# Start (auto mode; interactive included by default)
.agents/skills/examples-auto-run/scripts/run.sh start [extra args to run_examples.py]
# Examples:
.agents/skills/examples-auto-run/scripts/run.sh start --filter basic
.agents/skills/examples-auto-run/scripts/run.sh start --include-server --include-audio

# Check status
.agents/skills/examples-auto-run/scripts/run.sh status

# Stop running job
.agents/skills/examples-auto-run/scripts/run.sh stop

# List logs
.agents/skills/examples-auto-run/scripts/run.sh logs

# Tail latest log (or specify one)
.agents/skills/examples-auto-run/scripts/run.sh tail
.agents/skills/examples-auto-run/scripts/run.sh tail main_20260113-123000.log

# Collect rerun list from a main log (defaults to latest main_*.log)
.agents/skills/examples-auto-run/scripts/run.sh collect

# Rerun only failed entries from rerun file (auto mode)
.agents/skills/examples-auto-run/scripts/run.sh rerun
```

## Codex execution environment

- When Codex invokes `start` or `rerun`, run the command outside the Codex sandbox by default
  (`sandbox_permissions=require_escalated`). Many examples start nested sandboxes, browsers, npm
  helpers, or cloud/local service processes; running from inside the Codex sandbox can produce
  environment-only failures such as `sandbox-exec: sandbox_apply: Operation not permitted`,
  Playwright cache permission errors, or npm cache permission errors.
- Use sandboxed execution only when the user explicitly asks for it or when running a narrow dry-run
  / log inspection command that does not execute examples.

## Defaults (overridable via env)

- `EXAMPLES_INTERACTIVE_MODE=auto`
- `EXAMPLES_INCLUDE_INTERACTIVE=1`
- `EXAMPLES_INCLUDE_SERVER=0`
- `EXAMPLES_INCLUDE_AUDIO=0`
- `EXAMPLES_INCLUDE_EXTERNAL=0`
- `EXAMPLES_UV_EXTRAS="litellm any-llm sqlalchemy redis blaxel modal runloop temporal"` (set to an empty string to disable extras)
- Auto-approvals in auto mode: `APPLY_PATCH_AUTO_APPROVE=1`, `SHELL_AUTO_APPROVE=1`, `AUTO_APPROVE_MCP=1`

## Log locations

- Main logs: `.tmp/examples-start-logs/main_*.log`
- Per-example logs (from `run_examples.py`): `.tmp/examples-start-logs/<module_path>.log`
- Rerun list: `.tmp/examples-rerun.txt`
- Stdout logs: `.tmp/examples-start-logs/stdout_*.log`

## Notes

- The runner delegates to `uv run --extra ... examples/run_examples.py`, which already writes per-example logs and supports `--collect`, `--rerun-file`, and `--print-auto-skip`.
- `start` uses `--write-rerun` so failures are captured automatically.
- If `.tmp/examples-rerun.txt` exists and is non-empty, invoking the skill with no args runs `rerun` by default.

## Behavioral validation (Codex/LLM responsibility)

The runner does not perform any automated behavioral validation. After every foreground `start` or `rerun`, **Codex must manually validate** all exit-0 entries:

1. Read the example source (and comments) to infer intended flow, tools used, and expected key outputs.
2. Open the matching per-example log under `.tmp/examples-start-logs/`.
3. Confirm the intended actions/results occurred; flag omissions or divergences.
4. Do this for **all passed examples**, not just a sample.
5. Report immediately after the run with concise citations to the exact log lines that justify the validation.
More from openai/openai-agents-python