data-schema-knowledge-modeling
$
npx mdskill add lyndonkl/claude/data-schema-knowledge-modelingFormalizes entities, constraints, and relationships into rigorous data models.
- Prevents data inconsistencies by defining cardinality and business rules.
- Supports SQL, NoSQL, graph, and ontology formalization patterns.
- Activates when users request schema design or knowledge graph creation.
- Delivers unambiguous models ready for database or API implementation.
SKILL.md
.github/skills/data-schema-knowledge-modelingView on GitHub ↗
--- name: data-schema-knowledge-modeling description: Creates rigorous, validated models of entities, relationships, and constraints for database schemas (SQL, NoSQL, graph), knowledge graphs, ontologies, API data models, and taxonomies. Covers relational, document, graph, event/time-series, and dimensional schema patterns with lifecycle modeling, soft deletes, polymorphic associations, and hierarchies. Use when user mentions "schema", "data model", "entities", "relationships", "ontology", "knowledge graph", or when data structures need formalization. --- # Data Schema & Knowledge Modeling ## Table of Contents - [Overview](#overview) - [Workflow](#workflow) - [Schema Types](#schema-types) - [Common Patterns](#common-patterns) - [Guardrails](#guardrails) - [Quick Reference](#quick-reference) ## Overview This skill formally defines entities, attributes, relationships, constraints, and cardinality to produce unambiguous data models that prevent inconsistencies and enable correct implementation. **Quick example:** E-commerce schema: - **Entities**: User, Product, Order, Cart, Payment - **Relationships**: User has many Orders, Order contains many Products (via OrderItems), User has one Cart - **Constraints**: Email must be unique, Order total matches sum of OrderItems, Payment amount equals Order total - **Result**: Unambiguous model that prevents data inconsistencies ## Workflow Copy this checklist and track your progress: ``` Data Schema & Knowledge Modeling Progress: - [ ] Step 1: Gather domain requirements and scope - [ ] Step 2: Identify entities and attributes - [ ] Step 3: Define relationships and cardinality - [ ] Step 4: Specify constraints and invariants - [ ] Step 5: Validate and document the model ``` **Step 1: Gather domain requirements and scope** Ask user for domain description, core use cases (what queries/operations will this support), existing data (if migration/integration), performance/scale requirements, and technology constraints (SQL vs NoSQL vs graph database). Understanding use cases shapes the model - OLTP vs OLAP vs graph traversal require different designs. See [Schema Types](#schema-types) for guidance. **Step 2: Identify entities and attributes** Extract nouns from requirements (those are candidate entities). For each entity, list attributes with types and nullability. Use [resources/template.md](resources/template.md) for systematic entity identification. Verify each entity represents a distinct concept with independent lifecycle. Document entity purpose and examples. **Step 3: Define relationships and cardinality** Map connections between entities (one-to-one, one-to-many, many-to-many). For many-to-many, identify junction tables/entities. Specify relationship directionality and optionality (can X exist without Y?). Use [resources/methodology.md](resources/methodology.md) for complex relationship patterns like hierarchies, polymorphic associations, and temporal relationships. **Step 4: Specify constraints and invariants** Define uniqueness constraints, foreign key relationships, check constraints, and business rules. Document domain invariants (rules that must hold true at all times). Identify derived/computed attributes vs stored. Use [resources/methodology.md](resources/methodology.md) for advanced constraint patterns and validation strategies. **Step 5: Validate and document the model** Create `data-schema-knowledge-modeling.md` file with complete schema definition. Validate against use cases - can the schema support required queries/operations? Check for normalization (eliminate redundancy) or denormalization (optimize for specific queries). Self-assess using [resources/evaluators/rubric_data_schema_knowledge_modeling.json](resources/evaluators/rubric_data_schema_knowledge_modeling.json). Minimum standard: Average score ≥ 3.5. ## Schema Types Choose based on use case and technology: **Relational (SQL) Schema** - **Best for:** Transactional systems (OLTP), strong consistency, complex queries with joins - **Pattern:** Normalized tables, foreign keys, ACID transactions - **Example use cases:** E-commerce orders, banking transactions, HR systems - **Key decision:** Normalization level (3NF for consistency vs denormalized for read performance) **Document/NoSQL Schema** - **Best for:** Flexible/evolving structure, high write throughput, denormalized reads - **Pattern:** Nested documents, embedded relationships, no joins - **Example use cases:** Content management, user profiles, event logs - **Key decision:** Embed vs reference (embed for 1-to-few, reference for 1-to-many) **Graph Schema (Ontology)** - **Best for:** Complex relationships, traversal queries, semantic reasoning, knowledge graphs - **Pattern:** Nodes (entities), edges (relationships), properties on both - **Example use cases:** Social networks, fraud detection, recommendation engines, scientific research - **Key decision:** Property graph vs RDF triples **Event/Time-Series Schema** - **Best for:** Audit logs, metrics, IoT data, append-only data - **Pattern:** Immutable events, time-based partitioning, aggregation tables - **Example use cases:** User activity tracking, monitoring, financial transactions - **Key decision:** Raw events vs pre-aggregated summaries **Dimensional (Data Warehouse) Schema** - **Best for:** Analytics (OLAP), aggregations, historical reporting - **Pattern:** Fact tables + dimension tables (star/snowflake schema) - **Example use cases:** Business intelligence, sales analytics, customer 360 - **Key decision:** Star schema (denormalized) vs snowflake (normalized dimensions) ## Common Patterns **Pattern: Entity Lifecycle Modeling** Track entity state changes explicitly. Example: Order (draft → pending → confirmed → shipped → delivered → completed/cancelled). Include status field, timestamps for each state, and transitions table if history needed. **Pattern: Soft Deletes** Never physically delete records - add `deletedAt` timestamp. Allows data recovery, audit compliance, and referential integrity. Filter `WHERE deletedAt IS NULL` in queries. **Pattern: Polymorphic Associations** Entity relates to multiple types. Example: Comment can be on Post or Photo. Options: (1) separate foreign keys (commentableType + commentableId), (2) junction tables per type, (3) single table inheritance. **Pattern: Temporal/Historical Data** Track changes over time. Options: (1) Effective/expiry dates per record, (2) separate history table, (3) event sourcing (store all changes as events). Choose based on query patterns. **Pattern: Multi-tenancy** Isolate data per customer. Options: (1) Separate databases (strong isolation), (2) Shared schema with tenantId column (efficient), (3) Separate schemas in same DB (balance). Add tenantId to all queries if shared. **Pattern: Hierarchies** Model trees/nested structures. Options: (1) Adjacency list (parentId), (2) Nested sets (left/right values), (3) Path enumeration (materialized path), (4) Closure table (all ancestor-descendant pairs). Trade-offs between read/write performance. ## Guardrails **✓ Do:** - Start with use cases - schema serves queries/operations - Normalize first, then denormalize for specific performance needs - Document all constraints and invariants explicitly - Use meaningful, consistent naming conventions - Consider future evolution - design for extensibility - Validate model against ALL required use cases - Model the real world accurately (don't force fit to technology) **✗ Don't:** - Design schema in isolation from use cases - Premature optimization (denormalize before measuring) - Skip constraint definitions (leads to data corruption) - Use generic names (data, value, thing) - be specific - Ignore cardinality and nullability - Model implementation details in domain entities - Forget about data migration path from existing systems - Create circular dependencies between entities ## Quick Reference **Resources:** - `resources/template.md` - Structured process for entity identification, relationship mapping, and constraint definition - `resources/methodology.md` - Advanced patterns: temporal modeling, graph ontologies, schema evolution, normalization strategies - `resources/examples/` - Worked examples showing complete schema designs with validation - `resources/evaluators/rubric_data_schema_knowledge_modeling.json` - Quality assessment before delivery **When to choose which resource:** - Simple domain (< 10 entities) → Start with template - Complex domain or graph/ontology → Study methodology for advanced patterns - Need to see examples → Review examples folder - Before delivering to user → Always validate with rubric **Expected deliverable:** `data-schema-knowledge-modeling.md` file containing: domain description, complete entity definitions with attributes and types, relationship mappings with cardinality, constraint specifications, diagram (ERD/graph visualization), validation against use cases, and implementation notes. **Common schema notations:** - **ERD** (Entity-Relationship Diagram): Visual representation of entities and relationships - **UML Class Diagram**: Object-oriented view with inheritance and associations - **Graph Diagram**: Nodes and edges for graph databases - **JSON Schema**: API/document structure with validation rules - **SQL DDL**: Executable CREATE TABLE statements - **Ontology (OWL/RDF)**: Semantic web knowledge representation
More from lyndonkl/claude
- abstraction-concrete-examplesBuilds structured abstraction ladders that translate high-level principles into concrete, actionable examples across 3-5 levels. Bridges communication gaps, reveals hidden assumptions, and tests whether abstract ideas work in practice. Use when explaining concepts at different expertise levels, moving between abstract principles and concrete implementation, identifying edge cases by testing ideas against scenarios, designing layered documentation, decomposing complex problems into actionable steps, or bridging strategy-execution gaps.
- academic-letter-architectGuides the creation of evidence-based academic recommendation letters, reference letters, and award nominations that combine concrete examples, meaningful comparisons, and genuine enthusiasm. Use when writing recommendation letters for students, postdocs, or colleagues, or when user mentions recommendation letter, reference, nomination, letter of support, endorsement, or needs help with strong advocacy and comparative statements.
- adr-architectureDocuments significant architectural and technical decisions with full context, alternatives considered, trade-offs analyzed, and consequences understood. Creates a decision trail that helps teams understand why decisions were made. Use when choosing between technology options, making infrastructure decisions, establishing standards, migrating systems, or when user mentions ADR, architecture decision, technical decision record, or decision documentation.
- adverse-selection-priorProduces a Bayesian prior probability that an offered transaction is +EV for the recipient, given that the counterparty chose to propose it. Applies Akerlof market-for-lemons logic -- if they offered it, they believe it is +EV for them, so the prior that it is +EV for us is materially below 50%. Reusable across trade evaluation, waiver drops (another team dropping a player is also adverse selection), job-offer analysis, M&A, and any "someone offered me this" situation. Use when you receive an unsolicited trade/offer/proposal, analyzing incoming trade prior, evaluating why a counterparty proposed a deal, or when user mentions adverse selection, market for lemons, why did they offer this, incoming trade prior, they proposed it, Bayesian adjustment on received offer.
- alignment-values-north-starCreates actionable alignment frameworks that give teams a shared North Star (direction), values (guardrails), and decision tenets (behavioral standards). Enables autonomous decision-making while maintaining organizational coherence. Use when starting new teams, scaling organizations, defining culture, establishing product vision, resolving misalignment, creating strategic clarity, or when user mentions North Star, team values, mission, principles, guardrails, decision framework, or cultural alignment.
- analogy-weight-checkFor every analogy in a substacker draft, verifies it carries mechanical weight — the analogy does real work explaining the mechanism, not merely decorates it. Cross-references analogy-catalog.md for novelty (is this analogy reused from a prior post?) and domain fit (biology > organizational > sports preferred; physics/military disfavored). Use whenever an analogy appears in the draft. Trigger keywords: analogy weight, decorative, mechanical weight, reused analogy, catalog check, metaphor check.
- answer-uncomfortable-questionTakes one strategic question about substacker ("should we launch paid?", "is this section dead?", "are we writing for the wrong audience?") and produces the mandatory evidence + reasoning + downside triad plus a recommendation. Used 3 times per Growth Strategist review. Trigger keywords: uncomfortable question, strategic question, evidence reasoning downside, triad.
- attribute-performanceFor each substacker post that materially over- or under-performs the rolling baseline (|z| ≥ 1.0), produces a plain-English attribution paragraph with calibrated confidence (high / medium / low / unexplained). Considers subject-line effect, topic zeitgeist, external share, day-of-week, length effect, and audience-notes signals. Labels unexplained outliers explicitly rather than fabricating a story. Use after compute-baseline when outlier posts exist. Trigger keywords: attribution, why did this post work, outlier explanation, performance analysis.
- auction-first-price-shadingComputes the optimal shaded bid for a first-price sealed-bid auction given a true private value, an estimate of the number of competing bidders N, and a value-distribution assumption. Implements the `(N-1)/N` equilibrium shading rule for uniform private values, adjusts for log-normal or empirical value distributions, layers a risk-aversion adjustment, and caps output against the bidder's remaining budget. Domain-neutral auction theory reusable across fantasy sports (baseball FAAB, NBA/NHL waiver auctions), prediction-market limit sizing, sealed procurement bids, and any blind-bid context. Use when user mentions "first-price auction bid", "sealed bid shading", "(N-1)/N", "FAAB bid amount", "auction shading", "optimal bid first-price", "bid for sealed-bid", "blind bid sizing", or when downstream logic needs a principled shade factor rather than an ad-hoc heuristic.
- auction-winners-curse-haircutApplies a Bayesian haircut to a bid valuation for common-value auctions where winning is itself evidence the bidder over-estimated. Takes a raw valuation, a value-type classification (common_value / private_value / mixed), the number of informed bidders N, and a signal-dispersion estimate, and returns an adjusted valuation. Domain-neutral and reusable across fantasy FAAB, prediction markets, M&A bids, ad-auction budgets, and any generic bidding context. Use when user mentions "winner's curse", "common value auction", "valuation haircut", "adverse valuation", "Bayesian bid adjustment", or "over-paying in auction".