ray-dependencies

$npx mdskill add ray-project/ray/ray-dependencies

Expert skill for managing Python dependencies across the Ray repository: the monorepo `requirements_compiled*.txt` lock files, the raydepsets DAG-based lock file manager, modular `python/requirements/**` source files, and Docker image dependency chains.

SKILL.md
.github/skills/ray-dependenciesView on GitHub ↗
---
name: ray-dependencies
description: Manage Python dependencies in Ray — add/remove/upgrade packages, work with raydepsets lock files, debug dependency conflicts, and regenerate compiled requirements. Covers `python/requirements*`, `python/requirements/**`, `python/deplocks/**`, and `ci/raydepsets/configs/*.depsets.yaml`.
user-invocable: true
argument-hint: <question or task>
---

# Ray Dependencies

Expert skill for managing Python dependencies across the Ray repository: the monorepo `requirements_compiled*.txt` lock files, the raydepsets DAG-based lock file manager, modular `python/requirements/**` source files, and Docker image dependency chains.

---

## When to use this skill

**Use this skill when the user wants to:**

- Add, remove, or upgrade a Python dependency in Ray
- Create or modify a `.depsets.yaml` config file
- Understand how dependencies flow between Ray Docker images
- Debug dependency conflicts or version resolution failures
- Regenerate lock files after requirement changes
- Create a new CI test environment with locked dependencies
- Understand why a specific package version was chosen

**Not for:**

- General Python packaging questions unrelated to Ray
- Ray runtime/API questions (use Ray docs)
- Docker image build issues unrelated to dependencies

---

## Workflow rules (project preferences)

These take precedence over generic packaging advice:

1. **Source-file pins only.** Resolve conflicts by editing `python/requirements.txt`, `python/requirements/**/*.txt`, or `python/requirements/**/*.in`. Do **not** hand-edit `python/requirements_compiled*.txt` — it is a build artifact and gets overwritten on the next compile.
2. **No `ci/ci.sh` post-process hooks** for individual package fixes. Only touch `ci/ci.sh` for genuinely structural changes (new functions, new input files passed to the compiler). Never as an after-the-fact lock mutator.
3. **Use the dual-exact-pin-with-markers pattern** when a package needs different versions on different Python versions and the lock is consumed as a constraint across multiple Python targets. See *Marker preservation* below.
4. **`python/requirements.txt` ↔ `python/setup.py` must stay in sync.** The packages under the `## setup.py install_requires` block in `python/requirements.txt` are mirrored into `setup_spec.install_requires` in `python/setup.py` (around line 405). When you add, remove, or change a version bound for any package in that block, edit **both** files in the same change. The list in `setup.py` is what end users actually install via `pip install ray`; `requirements.txt` is the dev-side source of truth. Drift between them ships broken installs.
5. **`python/requirements/llm/llm-requirements.txt` ↔ `setup_spec.extras["llm"]` must stay in sync.** The `ray[llm]` extra in `python/setup.py` (around line 377) and `python/requirements/llm/llm-requirements.txt` are paired source-of-truth files for the LLM install set — both have explicit "keep in sync" comments. When you add, remove, or change a permanent version bound for any package in the LLM extra, edit **both** files. **Exception:** temporary upper-bound workaround pins (`<=X.Y.Z` to dodge a bug) may live only in `llm-requirements.txt`. They must NOT be added to `setup.py`, since `setup.py` is the public install constraint and should not advertise short-lived workarounds as future API. Strip the `<=` pin from both files once the upstream fix lands.
6. **When presenting options, list source-file fixes first.** Mention lock or `ci.sh` edits only as fallbacks with downsides called out.

---

## Ray Dependency Architecture

Ray uses a two-tier dependency management system:

1. **`requirements_compiled*.txt`** — monorepo-wide pinned dependency files, compiled via `ci/ci.sh compile_pip_dependencies`. One per Python version: `requirements_compiled.txt` (default), `requirements_compiled_py3.10.txt`, `requirements_compiled_py3.11.txt`, etc.
2. **raydepsets** — a DAG-based lock file manager (`ci/raydepsets/`) that generates per-image, per-environment lock files from `.depsets.yaml` configs. Lock files live in `python/deplocks/` and `release/ray_release/byod/`.

### Key file locations

| Path | Purpose |
|---|---|
| `python/requirements.txt` | Base Ray installation requirements |
| `python/requirements_compiled.txt` | Monorepo pinned dependencies (per-Python-version variants exist) |
| `python/requirements/` | Modular requirement files by component (test, ml, data, train, tune, serve, rllib, llm) |
| `python/requirements/ml/py313/` | ML requirements organized by Python version |
| `python/requirements/data/` | Ray Data variant requirements (pyarrow versions, mongo, etc.) |
| `python/requirements/llm/` | LLM requirements (`llm-requirements.txt`, `llm-test-requirements.txt`) |
| `python/requirements/serve/` | Serve requirements and overrides |
| `python/deplocks/` | Generated lock files (output of raydepsets) |
| `docker/base-deps/requirements.in` | Docker base-deps layer requirements |
| `docker/base-extra/requirements.in` | Docker base-extra layer requirements |
| `docker/base-slim/requirements.in` | Docker slim image requirements |
| `release/ray_release/byod/` | BYOD (Bring Your Own Dependencies) files for release tests |

---

## Generating the constraints (`requirements_compiled*.txt`)

The two compiled lock files are produced by two `bash` functions in `ci/ci.sh`:

| Function | Output | Source set |
|---|---|---|
| `compile_pip_dependencies` (`ci/ci.sh:16`) | `python/requirements_compiled.txt` | shared `python/requirements/**` files (test, cloud, docker, `ml/*`, security) |
| `compile_313_pip_dependencies` (`ci/ci.sh:82`) | `python/requirements_compiled_py3.13.txt` | py313 overrides under `python/requirements/py313/**` and `python/requirements/ml/py313/**`, falling back to shared files |

### How to invoke

```bash
# Default-Python lock (currently py3.10/3.11/3.12 generic)
ci/ci.sh compile_pip_dependencies

# py3.13 lock (uses py313 override directories)
ci/ci.sh compile_313_pip_dependencies

# Custom output filename (rare)
ci/ci.sh compile_pip_dependencies my_custom_lock.txt
```

Run inside an environment matching the target Python version (the function `pip install`s `pip-tools==7.4.1` and `wheel==0.45.1` itself). aarch64/arm64 is a no-op — the function returns early because not all pinned packages have aarch64 wheels.

### What it does

Both functions wrap `pip-compile` (pip-tools 7.4.1) with the same flags:

```
pip-compile --verbose --resolver=backtracking \
  --pip-args --no-deps --strip-extras --no-header \
  --unsafe-package ray --unsafe-package pip --unsafe-package setuptools \
  -o "python/$TARGET" \
  <list of source requirement files>
```

Then two post-process `sed` passes on the output:
1. `sed -i "/@ file/d"` — strips local file:// install lines.
2. `sed -i -E 's/==([\.0-9]+)\+[^\b]*cpu/==\1/g'` — strips `+cpu` / `+pt20cpu` device-tag suffixes that the resolver inserts (otherwise `pip install` later complains about irresolvable constraints).

The function `pip install`s `numpy` and `torch` before compilation. **Why:** `pip-compile` runs with `--pip-args --no-deps`, but it still has to extract each candidate's own `install_requires` to feed the resolver. For legacy sdists (no wheel, no PEP 517 `pyproject.toml`), the only way to get metadata is to execute `setup.py egg_info` — which runs the package's `setup.py` as Python. If that `setup.py` does `import numpy` / `import torch` at module level (common for packages using `numpy.distutils` or `torch.utils.cpp_extension` for C-extension config), the import has to succeed or pip-compile aborts. pip-compile doesn't sandbox these in an isolated PEP 517 build env, so the calling Python must already have those modules installed. `dragonfly-opt` was the canonical offender (no longer in the tree); the preinstall stays because the same shape of failure recurs whenever a new sdist-only dep with imperative `setup.py` imports gets added.

### Source file lists

If you add a brand-new source requirement file under `python/requirements/**`, it will NOT be picked up automatically — you must extend the `pip-compile` source list inside the relevant function in `ci/ci.sh`. This is one of the few legitimate reasons to edit `ci/ci.sh` (see workflow rule 2). Update both `compile_pip_dependencies` and `compile_313_pip_dependencies` if the new file applies to both Python tracks; if py3.13 needs an override, place it under `python/requirements/py313/` or `python/requirements/ml/py313/` and reference the override in `compile_313_pip_dependencies` only.

### Recompile + relock everything

After editing source requirements, the full refresh is:

```bash
ci/ci.sh compile_pip_dependencies && \
ci/ci.sh compile_313_pip_dependencies && \
bazelisk run //ci/raydepsets:raydepsets -- build --all-configs
```

The compiled `requirements_compiled*.txt` files feed raydepsets as constraints (via the `remove-compiled-headers.sh` pre-hook, which copies them to `/tmp/ray-deps/` with GPU index URLs stripped).

---

## raydepsets System

### What it does

raydepsets models relationships between lock files as a directed acyclic graph (DAG), so when a dependency changes, downstream lock files regenerate in the correct order. It wraps `uv pip compile` with cross-file consistency guarantees.

### Architecture

```
YAML configs --> Config parser (workspace.py) --> Template expansion --> DAG (NetworkX DiGraph) --> Topological execution --> uv pip compile --> Lock files
```

**Source files:**
- `ci/raydepsets/raydepsets.py` — entry point
- `ci/raydepsets/cli.py` — CLI (`build` command) and `DependencySetManager` class
- `ci/raydepsets/workspace.py` — config parsing, `Depset` dataclass, `Workspace` class, template substitution

### Four operations

#### 1. `compile`
Runs `uv pip compile` to resolve dependencies from `.in`/`.txt` requirement files into a hash-verified lock file.

**Fields:** `requirements` (input files), `constraints` (version constraint files), `packages` (inline package specs via stdin), `output` (lock file path)

```yaml
- name: ray_img_depset_${PYTHON_SHORT}
  operation: compile
  requirements:
    - python/deplocks/ray_img/ray_dev.in
  constraints:
    - /tmp/ray-deps/requirements_compiled_py${PYTHON_VERSION}.txt
  output: python/deplocks/ray_img/ray_img_py${PYTHON_SHORT}.lock
  append_flags:
    - --python-version=${PYTHON_VERSION}
    - --unsafe-package ray
    - --python-platform=linux
  build_arg_sets: [py310, py311, py312, py313]
  pre_hooks:
    - ci/raydepsets/pre_hooks/build-placeholder-wheel.sh
    - ci/raydepsets/pre_hooks/remove-compiled-headers.sh ${PYTHON_VERSION}
```

#### 2. `subset`
Extracts a subset of already-resolved packages from another depset's lock file. Validates all requested requirements exist in the source.

**Fields:** `source_depset`, `requirements`, `output`

```yaml
- name: ray_base_deps_${PYTHON_SHORT}
  operation: subset
  source_depset: ray_base_extra_testdeps_${PYTHON_SHORT}
  requirements:
    - docker/base-deps/requirements.in
  output: python/deplocks/base_deps/ray_base_deps_py${PYTHON_VERSION}.lock
```

#### 3. `expand`
Combines multiple depsets into one, optionally adding new requirements. Recursively collects all transitive requirements from referenced depsets and recompiles.

**Fields:** `depsets` (depset names to combine), `requirements`, `constraints`, `output`

```yaml
- name: compiled_ray_llm_test_depset_${PYTHON_VERSION}_${CUDA_CODE}
  operation: expand
  depsets:
    - ray_base_test_depset_${PYTHON_VERSION}_${CUDA_CODE}
  requirements:
    - python/requirements/llm/llm-requirements.txt
    - python/requirements/llm/llm-test-requirements.txt
  constraints:
    - python/deplocks/llm/ray_test_${PYTHON_VERSION}_${CUDA_CODE}.lock
  output: python/deplocks/llm/rayllm_test_${PYTHON_VERSION}_${CUDA_CODE}.lock
```

#### 4. `relax`
Removes specified packages from another depset's lock file. Does NOT re-resolve — simple removal. Use with caution; can create inconsistent environments.

**Fields:** `source_depset`, `packages`, `output`

```yaml
- name: relaxed_data_ci_depset_${PYTHON_SHORT}
  operation: relax
  source_depset: data_base_ci_depset_${PYTHON_SHORT}
  packages:
    - pyarrow
    - numpy
    - datasets
  output: python/deplocks/ci/relaxed_data-ci_depset_py${PYTHON_VERSION}.lock
```

### Config file format

Configs use the `.depsets.yaml` extension and live in `ci/raydepsets/configs/`.

**Top-level keys:**
- `build_arg_sets` (optional) — template variable sets for matrix expansion
- `depsets` — list of dependency set definitions

**Template variables** use `${VARIABLE_NAME}`. When a depset lists multiple `build_arg_sets`, it expands into one concrete depset per set.

| Field | Type | Description |
|---|---|---|
| `name` | string | Unique identifier (supports `${VAR}` substitution) |
| `operation` | string | `compile`, `subset`, `expand`, or `relax` |
| `output` | string | Output lock file path relative to workspace root |
| `build_arg_sets` | list | Which build arg sets to expand with |
| `append_flags` | list | Additional flags passed to `uv pip compile` |
| `override_flags` | list | Flags that replace matching defaults |
| `pre_hooks` | list | Shell commands to run before execution |
| `include_setuptools` | bool | Allow setuptools in output (default: `false`) |

**YAML anchors** are supported for DRY config:
```yaml
.common_settings: &common_settings
  append_flags:
    - --python-version=${PYTHON_VERSION_STR}
    - --unsafe-package ray
  build_arg_sets: [cpu, cu128]

depsets:
  - name: my_depset
    <<: *common_settings
    operation: compile
    requirements: [requirements.txt]
    output: output.lock
```

### Default `uv pip compile` flags

Applied automatically to every `compile` call:
```
--no-header
--generate-hashes
--index-strategy unsafe-best-match
--no-strip-markers
--emit-index-url
--emit-find-links
--quiet
--unsafe-package setuptools  (unless include_setuptools: true)
```

### Pre-hooks

Shell scripts that run before a depset executes. Modeled as nodes in the DAG.

- `ci/raydepsets/pre_hooks/build-placeholder-wheel.sh` — builds a placeholder Ray wheel so `uv pip compile` can resolve Ray as a dependency
- `ci/raydepsets/pre_hooks/remove-compiled-headers.sh ${PYTHON_VERSION}` — copies `requirements_compiled` to `/tmp/ray-deps/` and strips GPU index URLs (`--extra-index-url`, `--find-links`) so they don't leak into CPU lock files

### Existing config files

| Config | Purpose |
|---|---|
| `rayimg.depsets.yaml` | Core Ray Docker images (`ray_img_depset_*`, `ray_base_deps_*`, `ray_base_extra_*`, `ray_base_slim_*`, CPU/GPU/ML/LLM variants) |
| `rayllm.depsets.yaml` | RayLLM dependencies |
| `ci_data.depsets.yaml` | Ray Data CI tests (pyarrow latest/v9/nightly variants, mongo) |
| `ci_serve.depsets.yaml` | Ray Serve CI tests |
| `data_test.depsets.yaml` | Data test dependencies |
| `docs.depsets.yaml` | Documentation build dependencies |
| `llm_release_tests.depsets.yaml` | LLM release benchmarks |
| `release_compiled_graph_gpu_cu130.depsets.yaml` | Compiled graph GPU tests |
| `release_multimodal_inference_benchmarks_tests.depsets.yaml` | Multimodal benchmarks (audio, embedding, image, video) |

### Docker image hierarchy

The Ray Docker image layer structure (from `rayimg.depsets.yaml`):

```
ray_img_depset_* (compile: core Ray image deps)
  ├── ray_base_slim_* (expand: minimal slim image)
  ├── ray_base_extra_testdeps_* (expand: full test deps, CPU)
  │     ├── ray_base_deps_* (subset: first Docker layer)
  │     └── ray_base_extra_* (subset: second Docker layer)
  ├── ray_base_extra_testdeps_gpu_* (expand: GPU test deps with CUDA)
  ├── ray_base_extra_testdeps_llm_cuda_* (expand: LLM+CUDA test deps)
  ├── ray_ml_base_extra_testdeps_cuda_* (expand: ML+CUDA test deps)
  └── ray_base_deps_tpu_* (expand: TPU deps)
```

Cross-config dependencies are supported — e.g., `ci_data.depsets.yaml` references `ray_img_depset_*` from `rayimg.depsets.yaml`.

---

## Marker preservation (pip-compile, py313 / multi-Python locks)

`pip-compile` (pip-tools 7.4.1, the version `ci/ci.sh` uses) **strips `python_version` markers** from output pins UNLESS the source declaration is **an exact-version pin with the marker attached**. This matters because `requirements_compiled_py3.13.txt` is consumed as a constraint at multiple `--python-version` targets via uv.

**What works (markers survive):**
```
onnxruntime==1.18.0 ; ... and python_version <= '3.10'
onnxruntime==1.24.4 ; ... and python_version > '3.10'
```
Both exact pins, each with a marker. pip-compile at py3.11 evaluates the first as false → dropped; second as true → kept. Output retains the marker. When uv consumes the lock as constraint at `--python-version=3.10`, the marker is false → constraint skipped → resolver picks an older version from the source `.in` / `.txt`.

**What does NOT work (markers get stripped):**
```
scipy<1.16 ; python_version < '3.11'
scipy ; python_version >= '3.11'
```
A range cap + an unconstrained entry. Because scipy is also pulled transitively by many other packages with no marker, the consolidated pin loses the marker. Output: bare `scipy==1.17.1`, which then forces py3.10 depset compiles to fail (no cp310 wheel).

**Fix pattern — two exact pins:**
```
scipy==1.15.3 ; python_version < '3.11'
scipy==1.17.1 ; python_version >= '3.11'
```
Even though only one branch wins at compile time, the winning exact pin survives the transitive merge because its `==A.B.C` is more specific than any transitive range.

**Where to put the pins:**
- Broad cross-cutting compat → `python/requirements.txt`
- ML-specific compat → `python/requirements/ml/py313/ml-requirements.txt`
- Test-stack compat → `python/requirements/py313/test-requirements.txt`
- Don't pin in `python/requirements/ml/py313/dl-cpu-requirements.txt` for things unrelated to CPU torch.

**Classes of cliff to watch for when bumping py313 lock:**
1. Dropped cp310 wheels (most common; detectable via PyPI `Requires-Python` ≥ 3.11).
2. Transitive upper bounds from py<3.11-only deps (e.g. `tensorflow-metadata==1.17.3` caps `protobuf<=6.32` on py<3.11; not auto-detectable from `Requires-Python`).
3. Transitive extras that pull new deps in newer versions (e.g. `jsonschema==4.25+` adds `rfc3987-syntax` to its `format-nongpl` extra, which pulls `lark==1.3.1` and clashes with vllm's `lark==1.2.2`).
4. CPU-compile transitive pulls that conflict with GPU depsets (e.g. xgboost's unpinned `nvidia-nccl-cu12` can resolve to a version that conflicts with cu128 torch's pinned nccl — fix by pinning `nvidia-nccl-cu12==<cu128-matching-version>` in `dl-cpu-requirements.txt`).

---

## Commands

### Build all depsets from a config
```bash
bazelisk run //ci/raydepsets:raydepsets -- build ci/raydepsets/configs/rayimg.depsets.yaml
```

### Build a single named depset (and its dependencies)
```bash
bazelisk run //ci/raydepsets:raydepsets -- build ci/raydepsets/configs/rayimg.depsets.yaml --name ray_img_depset_313
```

### Build all configs at once
```bash
bazelisk run //ci/raydepsets:raydepsets -- build --all-configs
```

### Validate lock files are up-to-date (CI check mode)
```bash
bazelisk run //ci/raydepsets:raydepsets -- build ci/raydepsets/configs/rayimg.depsets.yaml --check
```

### Recompile everything (compiled requirements + all lock files)
```bash
ci/ci.sh compile_pip_dependencies && bazelisk run //ci/raydepsets:raydepsets -- build --all-configs
```

### Run raydepsets tests
```bash
bazel test //ci/raydepsets:test_cli
bazel test //ci/raydepsets:test_workspace
```

---

## Common workflows

### Adding a new dependency to a Ray component

1. Identify the correct source requirements file (e.g., `python/requirements/ml/py313/data-requirements.txt` for Ray Data on py313).
2. Add the package with appropriate version bounds (`>=min,<max`). Use the dual-exact-pin-with-markers pattern if the version differs across Python versions.
3. Recompile monorepo deps: `ci/ci.sh compile_pip_dependencies`.
4. Rebuild affected lock files: `bazelisk run //ci/raydepsets:raydepsets -- build ci/raydepsets/configs/<relevant>.depsets.yaml`.
5. Verify with `--check`.

### Creating a new CI test environment

1. Create a `.in` file listing the extra packages.
2. Create or edit a `.depsets.yaml` config in `ci/raydepsets/configs/`.
3. Choose the right base depset to expand from (usually `ray_img_depset_*` for CPU, or a GPU variant).
4. Define the depset with `operation: expand`, referencing the base and your new requirements.
5. Output to `python/deplocks/ci/<name>.lock` or appropriate path.
6. Build and verify with `bazelisk run //ci/raydepsets:raydepsets -- build ci/raydepsets/configs/<your-config>.depsets.yaml`.

### Debugging a dependency conflict

1. Read the failing lock file and the error from `uv pip compile`.
2. Check constraints — the constraints file pins versions; conflicts arise when a new requirement is incompatible.
3. Decide whether the right fix is a source-file pin (preferred), a marker-gated dual pin, or — as a last resort — a `relax` on the base depset.
4. Check cross-config dependencies — a depset in one config may depend on depsets from another config.
5. Use `--name` to rebuild just the failing depset for faster iteration.

### Upgrading a package across all lock files

1. Update the version in the source requirements file.
2. Recompile: `ci/ci.sh compile_pip_dependencies`.
3. Rebuild: `bazelisk run //ci/raydepsets:raydepsets -- build --all-configs`.
4. Review diffs in the generated lock files to verify the upgrade propagated.

---

## Patterns and gotchas

- **CUDA variants:** GPU depsets typically use `--index https://download.pytorch.org/whl/<cuda_code>` to pull PyTorch from the correct CUDA index.
- **Platform targeting:** Use `--python-platform=linux` or `--python-platform=x86_64-manylinux_2_31` for Linux-specific resolution.
- **`--unsafe-package ray`:** Always include so `uv` doesn't try to resolve Ray from PyPI (we use a local build).
- **Constraint files at `/tmp/ray-deps/`:** Created by `remove-compiled-headers.sh`; GPU-stripped versions of `requirements_compiled`.
- **`include_setuptools: true`:** Only set for base-deps and TPU depsets that need setuptools at runtime.
- **Cross-config references:** A depset in `ci_data.depsets.yaml` can reference `ray_img_depset_*` defined in `rayimg.depsets.yaml` because all configs are loaded under `--all-configs` (or pulled in transitively when one config's depset depends on a node from another).
- **Lock file output paths:** Docker image deps go to `python/deplocks/`; release test deps go to `release/ray_release/byod/`.
More from ray-project/ray