symmetry-validation-suite
$
npx mdskill add lyndonkl/claude/symmetry-validation-suiteValidate symmetry assumptions before committing to equivariant architecture.
- Prevents performance loss from incorrect symmetry hypotheses.
- Requires prior symmetry discovery work or domain analysis.
- Executes empirical tests on invariance, equivariance, and group structure.
- Outputs a structured checklist with progress tracking and results.
SKILL.md
.github/skills/symmetry-validation-suiteView on GitHub ↗
---
name: symmetry-validation-suite
description: Provides empirical test protocols and metrics to validate whether hypothesized symmetries actually hold in data or models before committing to equivariant architecture. Includes invariance/equivariance testing, group structure verification, and distribution analysis under transforms. Use when testing invariance, validating equivariance, checking symmetry assumptions, debugging symmetry-related model failures, or needing data-driven validation before architecture decisions.
---
# Symmetry Validation Suite
Wrong symmetry assumptions hurt model performance -- too much symmetry over-constrains, while missing symmetry wastes capacity. Validate before committing to equivariant architecture.
## Workflow
Copy this checklist and track your progress:
```
Symmetry Validation Progress:
- [ ] Step 1: List symmetry hypotheses to test
- [ ] Step 2: Design transformation test sets
- [ ] Step 3: Run invariance/equivariance tests
- [ ] Step 4: Verify group structure
- [ ] Step 5: Analyze data distribution under transforms
- [ ] Step 6: Document validation results
```
**Step 1: List symmetry hypotheses to test**
Gather candidate symmetries from previous discovery work. For each, document: the transformation type, whether invariance or equivariance is expected, and confidence level. Prioritize testing low-confidence hypotheses. If no hypotheses exist, work with user through domain analysis to identify candidate symmetries first.
**Step 2: Design transformation test sets**
For each symmetry, create test protocol: Sample representative inputs from data distribution. Define transformation sampling strategy (random rotations, all permutations, etc.). Determine appropriate sample sizes for statistical significance. Consider edge cases and boundary conditions. See [Transformation Sampling](#transformation-sampling) for guidance. For detailed methodology, consult [Methodology Details](./resources/methodology.md).
**Step 3: Run invariance/equivariance tests**
For invariance testing: Apply transformation T to input x, compute outputs f(x) and f(T(x)), measure error ||f(T(x)) - f(x)||. For equivariance testing: Compute f(T(x)) and T'(f(x)) where T' is the output transformation, measure error ||f(T(x)) - T'(f(x))||. Use [Testing Protocols](#testing-protocols) for implementation details. Aggregate across samples and compute statistics. For complete code examples, see [Test Implementation Examples](./resources/test-examples.md).
**Step 4: Verify group structure**
Check that claimed transformations form a valid group: Test closure (composition of two transforms is a transform). Test associativity. Verify identity element exists. Verify inverses exist. For Lie groups, check that generators close under commutator. See [Group Structure Tests](#group-structure-tests).
**Step 5: Analyze data distribution under transforms**
Check if transformed data stays in-distribution: Apply transforms to training data. Compare statistics of original vs transformed data. Check for distributional shift that might break assumptions. Identify transformation ranges that maintain validity. This catches "approximate symmetry" cases where symmetry holds only within bounds.
**Step 6: Document validation results**
Create validation report using [Output Template](#output-template). For each symmetry: state hypothesis, test methodology, quantitative results, pass/fail decision. Recommend whether to use hard equivariance constraint, soft constraint (regularization), data augmentation, or no symmetry at all. Quality criteria for this output are defined in [Quality Rubric](./resources/evaluators/rubric_validation.json).
## Testing Protocols
### Invariance Test Protocol
```python
def test_invariance(model, data_samples, transform_fn, n_transforms=100):
"""
Test if model output is invariant to transformations.
Returns:
mean_error: Average ||f(T(x)) - f(x)||
max_error: Maximum error observed
pass_rate: Fraction with error < threshold
"""
errors = []
for x in data_samples:
y_orig = model(x)
for _ in range(n_transforms):
x_transformed = transform_fn(x)
y_transformed = model(x_transformed)
error = norm(y_transformed - y_orig)
errors.append(error)
return {
'mean_error': mean(errors),
'max_error': max(errors),
'std_error': std(errors),
'pass_rate': sum(e < threshold for e in errors) / len(errors)
}
```
### Equivariance Test Protocol
```python
def test_equivariance(model, data_samples, input_transform, output_transform):
"""
Test if f(T(x)) = T'(f(x)) for equivariance.
Returns:
mean_error: Average ||f(T(x)) - T'(f(x))||
relative_error: Error normalized by output magnitude
"""
errors = []
for x in data_samples:
# Method 1: Transform then model
x_T = input_transform(x)
y1 = model(x_T)
# Method 2: Model then transform
y = model(x)
y2 = output_transform(y)
error = norm(y1 - y2)
relative = error / (norm(y2) + eps)
errors.append({'absolute': error, 'relative': relative})
return aggregate_stats(errors)
```
### Statistical Significance
For reliable results:
- Use at least 100 data samples
- Test at least 50 random transformations per sample
- Report mean, std, and percentiles (95th, 99th)
- Set threshold based on numerical precision expectations
- Use hypothesis testing if comparing methods
## Transformation Sampling
### Continuous Groups
| Group | Sampling Strategy |
|-------|-------------------|
| SO(2) | Uniform random angles θ ∈ [0, 2π) |
| SO(3) | Uniform random quaternions or axis-angle |
| SE(3) | Combine SO(3) rotation + uniform translation |
| Translations | Uniform within expected data range |
### Discrete Groups
| Group | Sampling Strategy |
|-------|-------------------|
| Cₙ | All n rotations |
| Dₙ | All 2n elements (rotations + reflections) |
| Sₙ | Random permutations (full enumeration if n ≤ 6) |
## Group Structure Tests
### Closure Test
```
For random g₁, g₂ ∈ G:
Compute g₃ = g₁ · g₂
Verify g₃ ∈ G (within numerical tolerance)
```
### Associativity Test
```
For random g₁, g₂, g₃ ∈ G:
Compute (g₁ · g₂) · g₃
Compute g₁ · (g₂ · g₃)
Verify equality (within tolerance)
```
### Identity and Inverse Test
```
For random g ∈ G:
Verify g · e = e · g = g
Find g⁻¹ and verify g · g⁻¹ = e
```
## Interpretation Guide
### Error Thresholds
| Error Level | Interpretation |
|-------------|----------------|
| < 1e-6 | Exact symmetry (numerical precision) |
| 1e-6 to 1e-3 | Strong approximate symmetry |
| 1e-3 to 0.01 | Weak approximate symmetry |
| > 0.01 | Symmetry likely doesn't hold |
### Decision Matrix
| Validation Result | Recommendation |
|-------------------|----------------|
| Exact symmetry confirmed | Use hard equivariant constraint |
| Strong approximate | Use equivariant architecture |
| Weak approximate | Consider soft constraint or augmentation |
| Symmetry broken | Don't enforce this symmetry |
| Partial symmetry | Use conditional/local equivariance |
## Output Template
```
SYMMETRY VALIDATION REPORT
==========================
Tested Symmetries:
1. [Transformation]: [Invariance/Equivariance]
- Sample size: [N samples × M transforms]
- Mean error: [value]
- Max error: [value]
- Pass rate: [%] at threshold [value]
- RESULT: [PASS/FAIL/PARTIAL]
- Recommendation: [Hard constraint/Soft/Augmentation/None]
2. [Transformation]: [Invariance/Equivariance]
...
Group Structure:
- Closure: [PASS/FAIL]
- Associativity: [PASS/FAIL]
- Identity/Inverse: [PASS/FAIL]
Distribution Analysis:
- Transform range where symmetry holds: [bounds]
- Detected breaking factors: [list]
SUMMARY:
- Confirmed symmetries: [list]
- Rejected symmetries: [list]
- Proceed to architecture design with: [group specification]
```
More from lyndonkl/claude
- abstraction-concrete-examplesBuilds structured abstraction ladders that translate high-level principles into concrete, actionable examples across 3-5 levels. Bridges communication gaps, reveals hidden assumptions, and tests whether abstract ideas work in practice. Use when explaining concepts at different expertise levels, moving between abstract principles and concrete implementation, identifying edge cases by testing ideas against scenarios, designing layered documentation, decomposing complex problems into actionable steps, or bridging strategy-execution gaps.
- academic-letter-architectGuides the creation of evidence-based academic recommendation letters, reference letters, and award nominations that combine concrete examples, meaningful comparisons, and genuine enthusiasm. Use when writing recommendation letters for students, postdocs, or colleagues, or when user mentions recommendation letter, reference, nomination, letter of support, endorsement, or needs help with strong advocacy and comparative statements.
- adr-architectureDocuments significant architectural and technical decisions with full context, alternatives considered, trade-offs analyzed, and consequences understood. Creates a decision trail that helps teams understand why decisions were made. Use when choosing between technology options, making infrastructure decisions, establishing standards, migrating systems, or when user mentions ADR, architecture decision, technical decision record, or decision documentation.
- adverse-selection-priorProduces a Bayesian prior probability that an offered transaction is +EV for the recipient, given that the counterparty chose to propose it. Applies Akerlof market-for-lemons logic -- if they offered it, they believe it is +EV for them, so the prior that it is +EV for us is materially below 50%. Reusable across trade evaluation, waiver drops (another team dropping a player is also adverse selection), job-offer analysis, M&A, and any "someone offered me this" situation. Use when you receive an unsolicited trade/offer/proposal, analyzing incoming trade prior, evaluating why a counterparty proposed a deal, or when user mentions adverse selection, market for lemons, why did they offer this, incoming trade prior, they proposed it, Bayesian adjustment on received offer.
- alignment-values-north-starCreates actionable alignment frameworks that give teams a shared North Star (direction), values (guardrails), and decision tenets (behavioral standards). Enables autonomous decision-making while maintaining organizational coherence. Use when starting new teams, scaling organizations, defining culture, establishing product vision, resolving misalignment, creating strategic clarity, or when user mentions North Star, team values, mission, principles, guardrails, decision framework, or cultural alignment.
- analogy-weight-checkFor every analogy in a substacker draft, verifies it carries mechanical weight — the analogy does real work explaining the mechanism, not merely decorates it. Cross-references analogy-catalog.md for novelty (is this analogy reused from a prior post?) and domain fit (biology > organizational > sports preferred; physics/military disfavored). Use whenever an analogy appears in the draft. Trigger keywords: analogy weight, decorative, mechanical weight, reused analogy, catalog check, metaphor check.
- answer-uncomfortable-questionTakes one strategic question about substacker ("should we launch paid?", "is this section dead?", "are we writing for the wrong audience?") and produces the mandatory evidence + reasoning + downside triad plus a recommendation. Used 3 times per Growth Strategist review. Trigger keywords: uncomfortable question, strategic question, evidence reasoning downside, triad.
- attribute-performanceFor each substacker post that materially over- or under-performs the rolling baseline (|z| ≥ 1.0), produces a plain-English attribution paragraph with calibrated confidence (high / medium / low / unexplained). Considers subject-line effect, topic zeitgeist, external share, day-of-week, length effect, and audience-notes signals. Labels unexplained outliers explicitly rather than fabricating a story. Use after compute-baseline when outlier posts exist. Trigger keywords: attribution, why did this post work, outlier explanation, performance analysis.
- auction-first-price-shadingComputes the optimal shaded bid for a first-price sealed-bid auction given a true private value, an estimate of the number of competing bidders N, and a value-distribution assumption. Implements the `(N-1)/N` equilibrium shading rule for uniform private values, adjusts for log-normal or empirical value distributions, layers a risk-aversion adjustment, and caps output against the bidder's remaining budget. Domain-neutral auction theory reusable across fantasy sports (baseball FAAB, NBA/NHL waiver auctions), prediction-market limit sizing, sealed procurement bids, and any blind-bid context. Use when user mentions "first-price auction bid", "sealed bid shading", "(N-1)/N", "FAAB bid amount", "auction shading", "optimal bid first-price", "bid for sealed-bid", "blind bid sizing", or when downstream logic needs a principled shade factor rather than an ad-hoc heuristic.
- auction-winners-curse-haircutApplies a Bayesian haircut to a bid valuation for common-value auctions where winning is itself evidence the bidder over-estimated. Takes a raw valuation, a value-type classification (common_value / private_value / mixed), the number of informed bidders N, and a signal-dispersion estimate, and returns an adjusted valuation. Domain-neutral and reusable across fantasy FAAB, prediction markets, M&A bids, ad-auction budgets, and any generic bidding context. Use when user mentions "winner's curse", "common value auction", "valuation haircut", "adverse valuation", "Bayesian bid adjustment", or "over-paying in auction".