entity-model-auditor
$
npx mdskill add hylarucoder/hai-stack/entity-model-auditorAudits entity data models into field-by-field markdown reports
- Solves data model design, audit, and migration planning tasks
- Uses PRDs, codebases, and schema definitions as input sources
- Analyzes fields through five design dimensions and classification rules
- Delivers structured markdown reports with audit tables and change lists
SKILL.md
.github/skills/entity-model-auditorView on GitHub ↗
--- name: entity-model-auditor description: | Audit and design entity data models into a field-by-field markdown report: a target-vs-current audit table per entity, a classification of every field (table column / config blob / runtime-computed / remove), a grouped migration change list, and design-decision justifications — each field run through five cuts (exist? store or compute? column or config? system or content? what form to store?). Use this whenever the user reasons about what fields an entity should have or where they should live, even if they never say "audit": designing a data model, checking PRD fields against a DB schema or backend types, deciding store-vs-compute or column-vs-jsonb, judging whether a field is redundant, spotting a bloated/over-modeled entity, or planning a schema migration. Trigger even when they just paste a PRD, schema, or Prisma/SQLAlchemy model and ask "are the fields right", "what's missing", "should this be a column or jsonb", "is this field even needed". Also on 实体建模, 数据模型审核, 字段审查, 字段太多了, 这个表设计得对吗, PRD 对齐, 模型设计, 该存还是该算, 列还是配置, 帮我看下这个模型. --- # Entity Model Auditor Given a PRD (product requirements document) and an optional codebase, produce a structured data model audit for each entity. The audit answers: what fields should exist, where should they live, what do we have now, and what needs to change. Two stances run through the whole process: - **Start from the PRD, not the codebase.** The PRD defines what should exist; the codebase shows what does exist. The delta between them is the work. - **Every storage classification is really a "cost to change later" judgment.** A table column costs a schema migration to change; a config blob field is free to add or remove; a system field costs a code change; a content field changes by editing data alone. Cuts 3, 4, and 5 are all pricing this same question — what does it cost to change this field after ship — so reason from that one idea instead of memorizing three separate tables. The final artifact is both a specification (the target design) and a migration guide (the work to get there). Anyone reading it should know the target, the current state, and the change. ## Output shape The whole workflow fills in this one document. Build the sections in order; each step below produces a named slot. ``` # [Entity Name] Data Model Audit > Design principle: [one-line summary of the storage philosophy] ## 1. [Entity] Table — audit table for table-level fields ## 2. Config Fields — audit table for config fields, grouped by logical section ## 3. Runtime Fields — computed/derived fields with derivation rules ## 4. Related Entities — junction tables, version tables, etc. ## 5. Change List — grouped by change type ## 6. Design Decisions — non-obvious justifications ``` Sections 1 and 2 use the 8-column audit table defined in Step 3. Read `references/output-template.md` for a fill-in-the-blanks instance of this exact shape before finalizing. ## Workflow ### Step 1: Extract fields from the PRD Read the PRD the user provides. For each entity mentioned, extract every field or attribute — explicit or implied. Group them by the PRD's own sections if it has them. If the user also points to codebase files (DB models, API types, schemas), read those too. If no codebase is provided, the "current state" columns will be marked as unknown. ### Step 2: Apply the Five Cuts to each field Every field goes through five classification questions, in order. Each cut narrows down where and how the field should exist. Explain non-obvious decisions in the table rather than just recording the answer. #### Cut 1 — Should this field exist at all? A field earns its place only if it meets at least one condition: | Condition | Test | |-----------|------| | System necessary | The backend cannot function without this value (e.g., a permission level that determines access control logic) | | Non-redundant | The information it carries is not already covered by another field | | User-perceptible | When the user changes this value, they can observe a difference in behavior or output | If a field fails all three, it should be removed or merged into another field. Flag it in the table as "existence questioned" with the reasoning. Common patterns to watch for: - Fields that look independent but are highly correlated (e.g., a "relationship" field and a "communication style" field that always move together) - Numeric sliders with false precision (a 0-100 slider where the system only distinguishes 3 behaviors) - Descriptive fields that could be absorbed into a free-text prompt field instead of existing as separate structured inputs #### Cut 2 — Store or compute? | If... | Then... | |-------|---------| | The value can be derived from other persisted data | Do not store. Compute at runtime. | | The value is a pure function of other fields | Do not store. Compute when needed. | | The value changes automatically based on system events, not user actions | Do not store. Derive from the source of truth. | | None of the above | Store it. | Runtime-computed fields still appear in the audit table, marked as "runtime" in the storage column. Document the derivation rule (e.g., "status = derived from active task states in the task table"). Also watch for **storage vs presentation** splits: a field may store a raw value (like an object storage key) but the API returns a transformed version (like a full URL). In this case, the stored field and the API field are different things — name them accordingly (e.g., `photoKey` in storage, `photoUrl` in API response). #### Cut 3 — Table column or config blob? For fields that should be stored, decide where: | If... | Then... | |-------|---------| | You need to filter, sort, or query by this value | Table column | | It's a user-facing "tuning knob" that doesn't affect queries | Config blob (jsonb / json) | The benefit of config blob: adding a new field requires no database migration (config avoids a migration per change — the cost-to-change lens from the intro). Fields promoted to table columns should be ones the product actually filters on (check the PRD's list/search/filter requirements), not ones that "might" be useful to query. #### Cut 4 — System field or content field? This only applies to fields stored in the config blob: | Type | Definition | Implication | |------|-----------|-------------| | System field | Backend logic branches on this value (if/switch statements, query parameters, API behavior changes) | Validate against an enum. Backend must understand every possible value. | | Content field | Flows into prompts, display, or descriptions without the backend interpreting it | Free text. No enum validation. Template/seed data provides initial values. | Why this matters (same cost-to-change lens): a system field costs a code change every time you add an option, because the backend branches on it; a content field gets new options by updating template data alone. Cuts 3, 4, and 5 are all pricing the same thing — Cut 3 prices a schema migration, Cut 4 prices a code change, Cut 5 prices a data/transform change. #### Cut 5 — What form is the stored value? For each stored field, clarify what exactly gets persisted: | Pattern | Example | |---------|---------| | The stored value IS the display value | A "one-line description" field stores the actual text users see | | The stored value is a key that maps to display text | `priority: "high"` → displayed as "紧急" or "High Priority" via i18n | | The stored value is raw material, the API transforms it | `photoKey: "users/abc.png"` → API returns `photoUrl: "https://cdn.example.com/users/abc.png"` | | The stored value is an input, a computed output is derived | `template` (stored) → `renderedContent` (computed by merging template with context data) | Content fields sourced from templates typically store the actual display value, not an abstract key — because the template IS the source of truth for what options exist. ### Step 3: Build the audit table (fills sections 1 and 2) For each entity, produce a table with these 8 columns. This is the canonical audit table used by both **section 1 (table-level fields)** and **section 2 (config fields)** of the output — split the rows between the two sections by their Classification. | Column | What it answers | |--------|----------------| | **Target field** | What this field should be called in the ideal design | | **Type** | Data type (string, int, enum, jsonb, etc.) | | **Classification** | Where it belongs after the five cuts (table column / config / runtime / remove) | | **Field nature** | System or content (for config fields only) | | **Existence justification** | Why this field needs to exist (system necessary / non-redundant / user-perceptible), or "questioned" if dubious | | **Current field** | What it's called in the current codebase, or "—" if it doesn't exist | | **Current source** | Where it currently lives (DB column, config json, not implemented, etc.) | | **Change** | What needs to happen (—, rename, new, move, remove, connect API) | Group fields by logical section (identity, behavior, parameters, access control, etc.) based on what makes sense for the entity. Use the PRD's own grouping as a starting point but don't follow it blindly — regroup if the PRD's sections mix concerns. ### Step 4: Produce the change list (fills section 5) After the table, summarize all changes needed in a flat list grouped by change type: - **Renames** — field X → field Y (list the full scope: DB migration, API, contract, frontend) - **New fields** — fields to add, with default values - **Removals** — fields to drop (moved to runtime / merged / unnecessary) - **Moves** — fields changing storage location (column → config, config → runtime) - **API gaps** — fields that exist in storage but have no update/read path through the API ### Step 5: Document runtime-computed fields (fills section 3) List all fields that are not stored but need to be available at the API or runtime level. For each, document: - The derivation rule (how to compute it) - Where the source data lives - When the computation happens (API response time? request execution time?) Junction tables, version tables, and other related entities go in **section 4** — apply the same five cuts to each related entity's fields. ### Step 6: List design decisions (fills section 6) For any non-obvious classification decision, write a one-row justification: | Decision | Reasoning | |----------|-----------| | `summary` is a column, not config | Card list displays it directly; avoiding jsonb parse on every list query | | `theme` moved from column to config | Not a query/filter dimension; keeping columns minimal | ## Output Assemble the six sections from the "Output shape" skeleton at the top of this file: sections 1 and 2 hold the 8-column audit table (split by Classification), section 3 the runtime fields, section 4 related entities, section 5 the change list, section 6 the design decisions. Before finalizing, read `references/output-template.md` — it is a fill-in-the-blanks instance of exactly that shape, with the same section headings and the same column set. Two ordering rules to keep the result honest: - Question every field's existence (Cut 1) before deciding where to put it. A field that shouldn't exist doesn't need a storage decision. - Name each field for what it is, not where it came from or how it's used — and surface the stored form, not the presentation form (an object storage key, not a full URL). For deeper naming disputes, hand off to hai-naming. ## When to hand off This skill assumes the PRD is roughly settled and focuses on field placement and the storage/migration delta. Route elsewhere when the real problem is upstream: - **hai-naming** — the dispute is purely what to call a field/concept, not where it should live. - **hai-prd** — the PRD itself is the problem: scope is wrong, requirements conflict, or the fields are undecided because the product intent is undecided. Fix the PRD first, then audit. - **clean-code-reviewer / react-component-diagnosis** — the user wants general code-quality or component-design review, not a field-level data-model audit.
More from hylarucoder/hai-stack
- ast-grep-rule-crafter|
- clean-code-reviewerProduces a severity-rated (高/中/低) Clean Code findings report across 7 dimensions (naming, function size/SRP, duplication/DRY, over-engineering/YAGNI, magic numbers, structural clarity, project conventions), each with a location and a behavior-preserving refactor suggestion — never changing functionality. Use whenever the user asks for a code review, quality check, refactor advice, or code-smell / Clean Code analysis, OR points at a file/function/diff and asks if it is well-written, too long, too repetitive, over-engineered, or poorly named — even casually, and even if they never say "review" ("I just wrote this, look it over", "does this look good before I commit"). Trigger on 代码体检, 代码质量, 重构检查, 代码审查, 这段代码写得怎么样, 帮我看看代码有没有问题, 有没有坏味道, 这函数是不是太长了, 命名规范吗, 魔法数字, 重复代码, 过度设计, and English like "is this code clean", "any code smells", "check this file".
- create-visual-card|
- geju|
- goudi|
- hai-architecture|
- hai-audit-docs-against-code|
- hai-audit-docs-internally|
- hai-goal|
- hai-idea|