cache-policy-comparison
$
npx mdskill add elizaOS/eliza/cache-policy-comparisonAn eviction policy decides which resident entry a cache removes when a new entry is admitted beyond capacity. Four policies cover almost every replay-and-measure task:
SKILL.md
.github/skills/cache-policy-comparisonView on GitHub ↗
---
name: cache-policy-comparison
description: Compare and implement eviction policies (LRU, LFU, FIFO, S3FIFO, ARC) for bounded-capacity caches. Use when choosing or implementing an eviction policy for a buffer pool, page cache, CDN edge, or LLM KV cache, or when writing a replay simulator that supports multiple policies. Clarifies recency vs frequency semantics, queue topology, saturating counters, ghost buffers, and the second-chance rule that distinguishes modern FIFO-family policies from classic LRU.
---
# Overview
An eviction policy decides which resident entry a cache removes when a new entry is admitted beyond capacity. Four policies cover almost every replay-and-measure task:
| Policy | Data structure | On hit | On admit | Eviction choice |
|----------|---------------------------------|------------------------------|--------------------------------------|---------------------------------------------------|
| LRU | OrderedDict | Move to tail | Append at tail | Pop head |
| LFU | `{key: freq}` + insertion order | `freq[k] += 1` | `freq[k] = 1` | Min `freq`, tiebreak by insertion order |
| FIFO | OrderedDict | Nothing | Append at tail | Pop head |
| S3FIFO | Three FIFO queues + `freq[k]` | `freq[k] = min(freq+1, cap)` | Admit to small; ghost-hit admits to main | Second-chance on main; small drains to main/ghost |
Each has subtleties that trip naive implementations.
# LRU
Use an `OrderedDict` where the tail is the most-recently-accessed key. On hit, `move_to_end`. On miss + insert, append; pop from head if over capacity.
Most common bug: **forgetting to update recency on a hit**. Without the refresh, LRU degenerates to FIFO — hit rate drops substantially on any workload with recency structure.
```python
from collections import OrderedDict
class LRU:
def __init__(self, capacity):
self.capacity = capacity
self._d = OrderedDict()
def contains(self, k): return k in self._d
def access(self, k):
if k in self._d:
self._d.move_to_end(k)
else:
self._d[k] = None
if len(self._d) > self.capacity:
self._d.popitem(last=False)
```
# LFU
Keep `freq: dict[key, int]` and a tie-breaker — an insertion counter is simplest and deterministic. On hit, increment `freq[k]`. On miss at capacity, evict `min(freq)` with ties broken by insertion order (oldest first).
Typical bugs:
- **No tie-breaker.** `min(freq.items(), key=lambda x: x[1])[0]` has implementation-defined behaviour across interpreters and distributions. Always include a secondary key.
- **Frequency pollution.** A block that was hot once and then went cold can linger forever because its freq is permanently above newcomers. Production systems add aging (periodic decay of freq) or combine with a recency signal (W-TinyLFU). Pure LFU is correct for the task as specified but fragile in practice.
# FIFO
One queue, insertion order, no hit-time update. Useful as a lower-bound baseline.
Do NOT call it "LRU without hit update" — conceptually different even when implementations overlap. Hit on a FIFO cache is still a hit for accounting; the block just does not change rank.
# S3FIFO
A modern FIFO-family policy (Yang et al., SOSP 2023) that matches or beats LRU on typical web and LLM workloads with a fraction of the bookkeeping cost — which is why recent production systems (Twitter, Google) have been switching to it. The full algorithm — three queues, saturating frequency counter, second-chance eviction on the main queue — is implemented in the `prefix-cache-replay` skill. Consult that skill if your task uses S3FIFO.
# Workload implications
- Strong **recency** → LRU wins slightly.
- Stable **hot set** with long tail (Zipf) → LFU or S3FIFO.
- Nearly uniform random → all converge toward `capacity / working_set` hit rate.
- **Prefix-shared LLM workloads** are mixed — shared prefixes are both recent and frequent, so LRU/LFU/S3FIFO typically sit within a few percent of each other at the same capacity, but they differ in which blocks remain resident at end-of-trace, and their miss-handling costs diverge. Measure, don't assume.
# Comparing hit rates on a trace
Replay the same trace through each policy at identical capacity, record `total_hit_tokens / total_prompt_tokens` and the final resident set. Do not compare hit rate alone — also compare:
- Final residency — how many unique blocks are resident at the end. Under S3FIFO this is often strictly less than capacity because ghost entries absorb the admission pressure.
- Per-request hit-token distribution — two policies can have similar overall hit rate but very different per-request variance.
- Admission effort — under policies with ghost structures, the bookkeeping cost per access is non-trivial.
# Common mistakes
- Reusing an LRU implementation when the task specifies S3FIFO (or vice versa). The final hit rate and residency will both differ; no partial credit for "close enough".
- Making ghost count as resident, or treating a ghost hit as a hit for token accounting.
- Forgetting to saturate `freq` — unbounded counters turn the main-queue second-chance loop into a spin.
- Under LFU, using Python `min(d.items(), key=d.get)` without an explicit insertion-order tiebreaker.
- Misordering admission and residency check. Always check `h ∈ cache` BEFORE applying the admission side effects of the current request, otherwise every request self-hits.
- Final cache size off by small constants because you forgot to exclude ghost or you forgot to subtract the S-cap vs M-cap split.
More from elizaOS/eliza
- ac-branch-pi-modelAC branch pi-model power flow equations (P/Q and |S|) with transformer tap ratio and phase shift, matching `acopf-math-model.md` and MATPOWER branch fields. Use when computing branch flows in either direction, aggregating bus injections for nodal balance, checking MVA (rateA) limits, computing branch loading %, or debugging sign/units issues in AC power flow.
- academic-pdf-redactionRedact text from PDF documents for blind review anonymization
- ada-plan-view-accessibilityUse when checking simplified ADA-derived plan-view bathroom accessibility constraints such as turning space, door clear width, toilet centerline, grab bars, and lavatory knee/toe clearance.
- analyze-ciAnalyze failed GitHub Action jobs for a pull request.
- architectural-dxf-extractionUse when extracting plan-view architectural geometry from DXF files with semantic CAD layers, especially when outputs must normalize rooms, doors, fixtures, clearances, and grab bars into machine-checkable JSON.
- attitude-controller-plannerUse this skill when implementing the inner control loop for a quadrotor — attitude (roll/pitch/yaw) PID control and attitude planning (converting desired acceleration to desired Euler angles). Covers gain layout, integral reset pattern, and the attitude planner inverse kinematics.
- azure-bgpAnalyze and resolve BGP oscillation and BGP route leaks in Azure Virtual WAN–style hub-and-spoke topologies (and similar cloud-managed BGP environments). Detect preference cycles, identify valley-free violations, and propose allowed policy-level mitigations while rejecting prohibited fixes.
- box-least-squaresBox Least Squares (BLS) periodogram for detecting transiting exoplanets and eclipsing binaries. Use when searching for periodic box-shaped dips in light curves. Alternative to Transit Least Squares, available in astropy.timeseries. Based on Kovács et al. (2002).
- browser-testingVERIFY your changes work. Measure CLS, detect theme flicker, test visual stability, check performance. Use BEFORE and AFTER making changes to confirm fixes. Includes ready-to-run scripts: measure-cls.ts, detect-flicker.ts
- casadi-ipopt-nlpNonlinear optimization with CasADi and IPOPT solver. Use when building and solving NLP problems: defining symbolic variables, adding nonlinear constraints, setting solver options, handling multiple initializations, and extracting solutions. Covers power systems optimization patterns including per-unit scaling and complex number formulations.