pca-decomposition

Name: pca-decomposition
Author: elizaOS/eliza

$npx mdskill add elizaOS/eliza/pca-decomposition

Principal Component Analysis (PCA) reduces many correlated variables into fewer uncorrelated components. Varimax rotation makes components more interpretable by maximizing variance.

SKILL.md

.github/skills/pca-decompositionView on GitHub ↗

---
name: pca-decomposition
description: Reduce dimensionality of multivariate data using PCA with varimax rotation. Use when you have many correlated variables and need to identify underlying factors or reduce collinearity.
license: MIT
---

# PCA Decomposition Guide

## Overview

Principal Component Analysis (PCA) reduces many correlated variables into fewer uncorrelated components. Varimax rotation makes components more interpretable by maximizing variance.

## When to Use PCA

- Many correlated predictor variables
- Need to identify underlying factor groups
- Reduce multicollinearity before regression
- Exploratory data analysis

## Basic PCA with Varimax Rotation
```python
from sklearn.preprocessing import StandardScaler
from factor_analyzer import FactorAnalyzer

# Standardize data first
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# PCA with varimax rotation
fa = FactorAnalyzer(n_factors=4, rotation='varimax')
fa.fit(X_scaled)

# Get factor loadings
loadings = fa.loadings_

# Get component scores for each observation
scores = fa.transform(X_scaled)
```

## Workflow for Attribution Analysis

When using PCA for contribution analysis with predefined categories:

1. **Combine ALL variables first**, then do PCA together:
```python
# Include all variables from all categories in one matrix
all_vars = ['AirTemp', 'NetRadiation', 'Precip', 'Inflow', 'Outflow',
            'WindSpeed', 'DevelopedArea', 'AgricultureArea']
X = df[all_vars].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# PCA on ALL variables together
fa = FactorAnalyzer(n_factors=4, rotation='varimax')
fa.fit(X_scaled)
scores = fa.transform(X_scaled)
```

2. **Interpret loadings** to map factors to categories (optional for understanding)

3. **Use factor scores directly** for R² decomposition

**Important**: Do NOT run separate PCA for each category. Run one global PCA on all variables, then use the resulting factor scores for contribution analysis.

## Interpreting Factor Loadings

Loadings show correlation between original variables and components:

| Loading | Interpretation |
|---------|----------------|
| > 0.7 | Strong association |
| 0.4 - 0.7 | Moderate association |
| < 0.4 | Weak association |

## Example: Economic Indicators
```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
from factor_analyzer import FactorAnalyzer

# Variables: gdp, unemployment, inflation, interest_rate, exports, imports
df = pd.read_csv('economic_data.csv')
variables = ['gdp', 'unemployment', 'inflation',
             'interest_rate', 'exports', 'imports']

X = df[variables].values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

fa = FactorAnalyzer(n_factors=3, rotation='varimax')
fa.fit(X_scaled)

# View loadings
loadings_df = pd.DataFrame(
    fa.loadings_,
    index=variables,
    columns=['RC1', 'RC2', 'RC3']
)
print(loadings_df.round(2))
```

## Choosing Number of Factors

### Option 1: Kaiser Criterion
```python
# Check eigenvalues
eigenvalues, _ = fa.get_eigenvalues()

# Keep factors with eigenvalue > 1
n_factors = sum(eigenvalues > 1)
```

### Option 2: Domain Knowledge

If you know how many categories your variables should group into, specify directly:
```python
# Example: health data with 3 expected categories (lifestyle, genetics, environment)
fa = FactorAnalyzer(n_factors=3, rotation='varimax')
```

## Common Issues

| Issue | Cause | Solution |
|-------|-------|----------|
| Loadings all similar | Too few factors | Increase n_factors |
| Negative loadings | Inverse relationship | Normal, interpret direction |
| Low variance explained | Data not suitable for PCA | Check correlations first |

## Best Practices

- Always standardize data before PCA
- Use varimax rotation for interpretability
- Check factor loadings to name components
- Use Kaiser criterion or domain knowledge for n_factors
- For attribution analysis, run ONE global PCA on all variables

More from elizaOS/eliza

Skill	Description
ac-branch-pi-model	AC branch pi-model power flow equations (P/Q and \|S\|) with transformer tap ratio and phase shift, matching `acopf-math-model.md` and MATPOWER branch fields. Use when computing branch flows in either direction, aggregating bus injections for nodal balance, checking MVA (rateA) limits, computing branch loading %, or debugging sign/units issues in AC power flow.
academic-pdf-redaction	Redact text from PDF documents for blind review anonymization
ada-plan-view-accessibility	Use when checking simplified ADA-derived plan-view bathroom accessibility constraints such as turning space, door clear width, toilet centerline, grab bars, and lavatory knee/toe clearance.
analyze-ci	Analyze failed GitHub Action jobs for a pull request.
architectural-dxf-extraction	Use when extracting plan-view architectural geometry from DXF files with semantic CAD layers, especially when outputs must normalize rooms, doors, fixtures, clearances, and grab bars into machine-checkable JSON.
attitude-controller-planner	Use this skill when implementing the inner control loop for a quadrotor — attitude (roll/pitch/yaw) PID control and attitude planning (converting desired acceleration to desired Euler angles). Covers gain layout, integral reset pattern, and the attitude planner inverse kinematics.
azure-bgp	Analyze and resolve BGP oscillation and BGP route leaks in Azure Virtual WAN–style hub-and-spoke topologies (and similar cloud-managed BGP environments). Detect preference cycles, identify valley-free violations, and propose allowed policy-level mitigations while rejecting prohibited fixes.
box-least-squares	Box Least Squares (BLS) periodogram for detecting transiting exoplanets and eclipsing binaries. Use when searching for periodic box-shaped dips in light curves. Alternative to Transit Least Squares, available in astropy.timeseries. Based on Kovács et al. (2002).
browser-testing	VERIFY your changes work. Measure CLS, detect theme flicker, test visual stability, check performance. Use BEFORE and AFTER making changes to confirm fixes. Includes ready-to-run scripts: measure-cls.ts, detect-flicker.ts
cache-policy-comparison	Compare and implement eviction policies (LRU, LFU, FIFO, S3FIFO, ARC) for bounded-capacity caches. Use when choosing or implementing an eviction policy for a buffer pool, page cache, CDN edge, or LLM KV cache, or when writing a replay simulator that supports multiple policies. Clarifies recency vs frequency semantics, queue topology, saturating counters, ghost buffers, and the second-chance rule that distinguishes modern FIFO-family policies from classic LRU.