backtesting-py-oracle

Name: backtesting-py-oracle
Author: terrylica/cc-skills
$npx mdskill add terrylica/cc-skills/backtesting-py-oracle
Validates SQL oracle results and backtests range bar patterns with backtesting.py
Ensures SQL and Python trade evaluations produce identical results
Uses backtesting.py, ClickHouse, and Python dataframes for validation
Requires hedging and exclusive_orders settings for accurate signal replication
Delivers bitwise-identical trade counts and performance metrics between systems
SKILL.md
.github/skills/backtesting-py-oracleView on GitHub ↗
---
name: backtesting-py-oracle
description: backtesting.py configuration for SQL oracle validation and range bar pattern backtesting. Use when running backtesting.py against.
allowed-tools: Read, Bash, Grep, Glob
---

# backtesting.py Oracle Validation for Range Bar Patterns

Configuration and anti-patterns for using backtesting.py to validate ClickHouse SQL sweep results. Ensures bit-atomic replicability between SQL and Python trade evaluation.

**Companion skills**: `clickhouse-antipatterns` (SQL correctness, AP-16) | `sweep-methodology` (sweep design) | `rangebar-eval-metrics` (evaluation metrics)

**Validated**: Gen600 oracle verification (2026-02-12) — 3 assets, 5 gates, ALL PASS.

---

> **Self-Evolving Skill**: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

## Critical Configuration (NEVER omit)

```python
from backtesting import Backtest

bt = Backtest(
    df,
    Strategy,
    cash=100_000,
    commission=0,
    hedging=True,           # REQUIRED: Multiple concurrent positions
    exclusive_orders=False,  # REQUIRED: Don't auto-close on new signal
)
```

**Why**: SQL evaluates each signal independently (overlapping trades allowed). Without `hedging=True`, backtesting.py skips signals while a position is open, producing fewer trades than SQL. This was discovered when SOLUSDT produced 105 Python trades vs 121 SQL trades — 16 signals were silently skipped.

---

## Anti-Patterns (Ordered by Severity)

### BP-01: Missing Multi-Position Mode (CRITICAL)

**Symptom**: Python produces fewer trades than SQL. Gate 1 (signal count) fails.

**Root Cause**: Default `exclusive_orders=True` prevents opening new positions while one is active.

**Fix**: Always use `hedging=True, exclusive_orders=False`.

### BP-02: ExitTime Sort Order (CRITICAL)

**Symptom**: Entry prices appear mismatched (Gate 3 fails) even though both SQL and Python use the same price source.

**Root Cause**: `stats._trades` is sorted by ExitTime, not EntryTime. When overlapping trades exit in a different order than they entered, trade[i] no longer maps to signal[i].

**Fix**:

```python
trades = stats._trades.sort_values("EntryTime").reset_index(drop=True)
```

### BP-03: NaN Poisoning in Rolling Quantile (CRITICAL)

**Symptom**: Cross-asset tests fail with far fewer Python trades. Feature quantile becomes NaN and propagates forward indefinitely.

**Root Cause**: `np.percentile` with NaN inputs returns NaN. If even one NaN feature value enters the rolling window, all subsequent quantiles become NaN, making all subsequent filter comparisons fail.

**Fix**: Skip NaN values when building the signal window:

```python
def _rolling_quantile_on_signals(feature_arr, is_signal_arr, quantile_pct, window=1000):
    result = np.full(len(feature_arr), np.nan)
    signal_values = []
    for i in range(len(feature_arr)):
        if is_signal_arr[i]:
            if len(signal_values) > 0:
                window_data = signal_values[-window:]
                result[i] = np.percentile(window_data, quantile_pct * 100)
            # Only append non-NaN values (matches SQL quantileExactExclusive NULL handling)
            if not np.isnan(feature_arr[i]):
                signal_values.append(feature_arr[i])
    return result
```

### BP-04: Data Range Mismatch (MODERATE)

**Symptom**: Different signal counts between SQL and Python for assets with early data (BNB, XRP).

**Root Cause**: `load_range_bars()` defaults to `start='2020-01-01'` but SQL has no lower bound.

**Fix**: Always pass `start='2017-01-01'` to cover all available data.

### BP-05: Margin Exhaustion with Overlapping Positions (MODERATE)

**Symptom**: Orders canceled with insufficient margin. Fewer trades than expected.

**Root Cause**: With `hedging=True` and default full-equity sizing, overlapping positions exhaust available margin.

**Fix**: Use fixed fractional sizing:

```python
self.buy(size=0.01)  # 1% equity per trade
```

### BP-06: Signal Timestamp vs Entry Timestamp (LOW)

**Symptom**: Gate 2 (timestamp match) fails because SQL uses signal bar timestamps while Python uses entry bar timestamps.

**Root Cause**: SQL outputs the signal detection bar's `timestamp_ms`. Python's `EntryTime` is the fill bar (next bar after signal). These differ by 1 bar.

**Fix**: Record signal bar timestamps in the strategy's `next()` method:

```python
# Before calling self.buy()
self._signal_timestamps.append(int(self.data.index[-1].timestamp() * 1000))
```

---

## 5-Gate Oracle Validation Framework

| Gate | Metric          | Threshold | What it catches                      |
| ---- | --------------- | --------- | ------------------------------------ |
| 1    | Signal Count    | <5% diff  | Missing signals, filter misalignment |
| 2    | Timestamp Match | >95%      | Timing offset, warmup differences    |
| 3    | Entry Price     | >95%      | Price source mismatch, sort ordering |
| 4    | Exit Type       | >90%      | Barrier logic differences            |
| 5    | Kelly Fraction  | <0.02     | Aggregate outcome alignment          |

**Expected residual**: 1-2 exit type mismatches per asset at TIME barrier boundary (bar 50). SQL uses `fwd_closes[max_bars]`, backtesting.py closes at current bar price. Impact on Kelly < 0.006.

---

## Strategy Architecture: Single vs Multi-Position

| Mode            | Constructor                            | Use Case              | Position Sizing                |
| --------------- | -------------------------------------- | --------------------- | ------------------------------ |
| Single-position | `hedging=False` (default)              | Champion 1-bar hold   | Full equity                    |
| Multi-position  | `hedging=True, exclusive_orders=False` | SQL oracle validation | Fixed fractional (`size=0.01`) |

### Multi-Position Strategy Template

```python
class Gen600Strategy(Strategy):
    def next(self):
        current_bar = len(self.data) - 1

        # 1. Register newly filled trades and set barriers
        for trade in self.trades:
            tid = id(trade)
            if tid not in self._known_trades:
                self._known_trades.add(tid)
                self._trade_entry_bar[tid] = current_bar
                actual_entry = trade.entry_price
                if self.tp_mult > 0:
                    trade.tp = actual_entry * (1.0 + self.tp_mult * self.threshold_pct)
                if self.sl_mult > 0:
                    trade.sl = actual_entry * (1.0 - self.sl_mult * self.threshold_pct)

        # 2. Check time barrier for each open trade
        for trade in list(self.trades):
            tid = id(trade)
            entry_bar = self._trade_entry_bar.get(tid, current_bar)
            if self.max_bars > 0 and (current_bar - entry_bar) >= self.max_bars:
                trade.close()
                self._trade_entry_bar.pop(tid, None)

        # 3. Check for new signal (no position guard — overlapping allowed)
        if self._is_signal[current_bar]:
            self.buy(size=0.01)
```

---

## Data Loading

```python
from data_loader import load_range_bars

df = load_range_bars(
    symbol="SOLUSDT",
    threshold=1000,
    start="2017-01-01",      # Cover all available data
    end="2025-02-05",        # Match SQL cutoff
    extra_columns=["volume_per_trade", "lookback_price_range"],  # Gen600 features
)
```

---

## HTML Equity Plot: Y-Axis Auto-Fit on Zoom (BP-07 to BP-10)

backtesting.py generates Bokeh HTML plots via `bt.plot()`. By default, the y-axis is fixed — zooming on the x-axis does NOT rescale the y-axis to fit visible data. This makes it impossible to inspect zoomed-in regions of equity curves that span 4+ orders of magnitude.

**Reference implementation**: `scripts/gen800/plotting.py` in opendeviationbar-patterns.

### BP-07: NEVER Use LogScale for Equity (CRITICAL)

**Symptom**: Equity panel y-axis doesn't auto-fit on zoom. JS callbacks fail silently.

**Root Cause**: Bokeh's `LogScale` breaks `CustomJS` y-range callbacks in multi-panel linked x_range layouts. The `js_on_change` callback fires but `y_range.start`/`y_range.end` assignments are ignored by the LogScale renderer.

**Proven via POC**: LogScale + JS callback works in single-panel mode but fails when 5 panels share a linked x_range.

**Fix**: Transform equity data to `log10()` in Python, display on **linear** scale with a `CustomJSTickFormatter` that shows `10^tick` as readable values:

```python
import numpy as np
from bokeh.models import CustomJSTickFormatter, Range1d

# Transform data
raw = np.asarray(src.data["equity"], dtype=float)
raw = np.where(raw > 0, raw, 1e-10)
src.data["equity"] = np.log10(raw).tolist()

# Custom tick formatter (shows 1%, 100%, 10K%, 1M%)
child.yaxis[0].formatter = CustomJSTickFormatter(code="""
    const v = Math.pow(10, tick);
    if (v >= 1e6) return (v/1e6).toFixed(1) + 'M%';
    if (v >= 1e3) return (v/1e3).toFixed(0) + 'K%';
    if (v >= 1) return v.toFixed(0) + '%';
    return v.toFixed(2) + '%';
""")

# MUST set initial y_range explicitly (backtesting.py leaves it as NaN)
valid = eq[np.isfinite(eq)]
pad = (valid.max() - valid.min()) * 0.05
child.y_range = Range1d(start=valid.min() - pad, end=valid.max() + pad)
```

### BP-08: Panel-Aware Column Matching (CRITICAL)

**Symptom**: Drawdown/P&L panels go blank when zooming. Shows millions of percent.

**Root Cause**: Multiple panels share the same `ColumnDataSource` (backtesting.py optimization). The Equity panel's Line renderer and the Drawdown panel's Line renderer both reference a source containing `equity`, `drawdown`, `High`, `Low`, etc. A naive "find first y column" approach picks `equity` for the Drawdown panel.

**Fix**: Match by `panel_label` FIRST, then by column name:

```python
panel_label = child.yaxis[0].axis_label or ""

# Label-first matching (panels share data sources!)
if "Drawdown" in panel_label and "drawdown" in d:
    hi_col = lo_col = "drawdown"
elif "Equity" in panel_label and "equity" in d:
    hi_col = lo_col = "equity"
elif "High" in d and "Low" in d:  # OHLC
    hi_col, lo_col = "High", "Low"
```

### BP-09: JS Callback Pattern for Y-Axis Auto-Fit (PROVEN)

Attach a `CustomJS` callback to `x_range.js_on_change` for each panel. The callback scans visible data and sets `y_range` directly:

```python
from bokeh.models import CustomJS

cb = CustomJS(
    args={"source": src, "yr": child.y_range,
          "x_col": "index", "y_col": "equity"},
    code="""
    const xs = source.data[x_col];
    const ys = source.data[y_col];
    const x0 = cb_obj.start, x1 = cb_obj.end;
    let lo = Infinity, hi = -Infinity;
    for (let i = 0; i < xs.length; i++) {
        if (xs[i] >= x0 && xs[i] <= x1 && isFinite(ys[i])) {
            if (ys[i] < lo) lo = ys[i];
            if (ys[i] > hi) hi = ys[i];
        }
    }
    if (isFinite(lo) && isFinite(hi) && lo < hi) {
        const pad = (hi - lo) * 0.05 || 0.001;
        yr.start = lo - pad;
        yr.end = hi + pad;
    }
    """,
)
child.x_range.js_on_change("start", cb)
child.x_range.js_on_change("end", cb)
```

### BP-10: DataRange1d(only_visible=True) Is Unreliable

**Symptom**: `DataRange1d(only_visible=True)` works for simple cases but fails with:

- LogScale panels (y-range computed in wrong space)
- VBar renderers (candlesticks — bounds not tracked correctly)
- Shared data sources across panels

**Fix**: Always use the JS callback pattern (BP-09) instead of `DataRange1d(only_visible=True)`.

### Panel Data Source Reference (backtesting.py internals)

| Panel    | Label             | Renderer                             | Y Column         | Source                |
| -------- | ----------------- | ------------------------------------ | ---------------- | --------------------- |
| Equity   | `"Equity"`        | Patch (`equity_dd`), Line (`equity`) | `equity`         | Shared OHLC source    |
| Drawdown | `"Drawdown"`      | Line (`drawdown`), Scatter (peak)    | `drawdown`       | Shared OHLC source    |
| P/L      | `"Profit / Loss"` | Scatter (`returns`), MultiLine       | `y` or `returns` | Separate trade source |
| OHLC     | `""` (no label)   | Segment, VBar                        | `High`/`Low`     | Shared OHLC source    |
| Volume   | `"Volume"`        | VBar                                 | `Volume` (top)   | Shared OHLC source    |

---

## Project Artifacts (rangebar-patterns repo)

| Artifact                    | Path                                              |
| --------------------------- | ------------------------------------------------- |
| Oracle comparison script    | `scripts/gen600_oracle_compare.py`                |
| Gen600 strategy (reference) | `backtest/backtesting_py/gen600_strategy.py`      |
| SQL oracle query template   | `sql/gen600_oracle_trades.sql`                    |
| Oracle validation findings  | `findings/2026-02-12-gen600-oracle-validation.md` |
| Backtest CLAUDE.md          | `backtest/CLAUDE.md`                              |
| ClickHouse AP-16            | `.claude/skills/clickhouse-antipatterns/SKILL.md` |
| Fork source                 | `~/fork-tools/backtesting.py/`                    |

## Post-Execution Reflection

After this skill completes, check before closing:

1. **Did the command succeed?** — If not, fix the instruction or error table that caused the failure.
2. **Did parameters or output change?** — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
3. **Was a workaround needed?** — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.