arckit-data-mesh-contract

$npx mdskill add tractorjuice/arc-kit/arckit-data-mesh-contract

Generate ODCS-compliant data mesh contracts with SLAs and governance.

  • Creates formal agreements between domain teams and data consumers.
  • Depends on existing architecture principles and project artifacts.
  • Validates prerequisites before drafting contract specifications.
  • Outputs structured contract documents following Open Data Contract Standard.
SKILL.md
.github/skills/arckit-data-mesh-contractView on GitHub ↗
---
name: arckit-data-mesh-contract
description: "Create federated data product contracts for mesh architectures with SLAs, governance, and interoperability guarantees (project)"
---

You are helping an enterprise architect **create a data mesh contract** for a data product in a federated mesh architecture.

This command generates a **data-mesh-contract** document that defines the formal agreement between a data product provider (domain team) and consumers, following the **Open Data Contract Standard (ODCS) v3.0.2**.

## User Input

```text
$ARGUMENTS
```

## Instructions

> **Note**: Before generating, scan `projects/` for existing project directories. For each project, list all `ARC-*.md` artifacts, check `external/` for reference documents, and check `000-global/` for cross-project policies. If no external docs exist but they would improve output, ask the user.

### Step 0: Check Prerequisites

**IMPORTANT**: Before generating a data mesh contract, verify that foundational artifacts exist:

1. **Architecture Principles** (REQUIRED):
   - Check if `projects/000-global/ARC-000-PRIN-*.md` exists
   - If it does NOT exist:

     ```text
     ❌ Architecture principles not found.

     Data mesh contracts require architecture principles to be established first.
     Principles should include mesh governance standards (federated ownership, data as a product, self-serve infrastructure).

     Please run: $arckit-principles Create enterprise architecture principles

     Then return here to generate the data mesh contract.
     ```

   - If it exists, proceed to Step 1

2. **Data Model** (HIGHLY RECOMMENDED):
   - Check if the project has a `ARC-*-DATA-*.md` file
   - If it does NOT exist:

     ```text
     ⚠️  Warning: No data model found for this project.

     Data mesh contracts are typically derived from existing data models (entities become data products).

     Consider running: $arckit-data-model Create data model for [project name]

     You can proceed without a data model, but you'll need to define schema from scratch.

     Continue anyway? (yes/no)
     ```

   - If user says "no", stop here and tell them to run `$arckit-data-model` first
   - If user says "yes" or if ARC-*-DATA-*.md exists, proceed to Step 1

3. **Stakeholder Analysis** (RECOMMENDED):
   - Check if the project has `ARC-*-STKE-*.md`
   - If it does NOT exist:

     ```text
     ⚠️  Warning: No stakeholder analysis found.

     Stakeholder analysis helps identify:
     - Domain owners (who owns this data product)
     - Consumers (who will use this data product)
     - Data stewards and governance stakeholders

     Consider running: $arckit-stakeholders Analyze stakeholders for [project name]

     You can proceed without stakeholder analysis, but ownership roles will be generic placeholders.

     Continue anyway? (yes/no)
     ```

   - If user says "no", stop here
   - If user says "yes" or if ARC-*-STKE-*.md exists, proceed to Step 1

**Gathering rules** (apply to all user questions in this command):

- Ask the most important question first; fill in secondary details from context or reasonable defaults.
- **Maximum 2 rounds of questions total.** After that, infer the best answer from available context.
- If still ambiguous after 2 rounds, make a reasonable choice and note: *"I went with [X] — easy to adjust if you prefer [Y]."*

### Step 1: Parse User Input

Extract the **data product name** from the user's message. Examples:

- "Create contract for customer payments"
- "Generate mesh contract for seller analytics data product"
- "customer-orders contract"

The data product name should be:

- Kebab-case: `customer-payments`, `seller-analytics`
- Descriptive of the business domain

If the user didn't provide a clear data product name, ask:

```text
What is the name of the data product for this contract?

Examples:
- customer-payments
- seller-analytics
- order-events
- fraud-detection-features

Data product name (kebab-case):
```

### Step 2: Identify the target project

- Use the **ArcKit Project Context** (above) to find the project matching the user's input (by name or number)
- If no match, create a new project:
  1. Use Glob to list `projects/*/` directories and find the highest `NNN-*` number (or start at `001` if none exist)
  2. Calculate the next number (zero-padded to 3 digits, e.g., `002`)
  3. Slugify the project name (lowercase, replace non-alphanumeric with hyphens, trim)
  4. Use the Write tool to create `projects/{NNN}-{slug}/README.md` with the project name, ID, and date — the Write tool will create all parent directories automatically
  5. Also create `projects/{NNN}-{slug}/external/README.md` with a note to place external reference documents here
  6. Set `PROJECT_ID` = the 3-digit number, `PROJECT_PATH` = the new directory path

**Important**: If the script creates a NEW project, inform the user:

```text
Created new project: Project {project_id} - {project_name}
   Location: {project_path}

Note: This is a new project. You may want to run these commands first:
- $arckit-stakeholders - Identify domain owners and consumers
- $arckit-data-model - Define entities that become data products
- $arckit-requirements - Capture DR-xxx data requirements
```

If the project ALREADY EXISTS, just acknowledge it:

```text
Using existing project: Project {project_id} - {project_name}
   Location: {project_path}
```

### Step 3: Create Contract Directory

Data mesh contracts should be organized in a subdirectory. The directory will be created automatically when saving the file with the Write tool.

The contract file will use the multi-instance naming pattern:

```text
{project_path}/data-contracts/ARC-{PROJECT_ID}-DMC-{NNN}-v1.0.md
```

Where `{NNN}` is the next sequential number for contracts in this project. Check existing files in `data-contracts/` and use the next number (001, 002, ...).

### Step 4: Read the Template

Read the data mesh contract template:

**Read the template** (with user override support):

- **First**, check if `.arckit/templates/data-mesh-contract-template.md` exists in the project root
- **If found**: Read the user's customized template (user override takes precedence)
- **If not found**: Read `.arckit/templates/data-mesh-contract-template.md` (default)

> **Tip**: Users can customize templates with `$arckit-customize data-mesh-contract`

### Step 4b: Read external documents and policies

- Read any **external documents** listed in the project context (`external/` files) — extract existing data product definitions, SLA terms, schema specifications, data quality rules
- Read any **enterprise standards** in `projects/000-global/external/` — extract enterprise data governance standards, data sharing agreements, cross-project data catalogue conventions
- If no external docs exist but they would improve the output, ask: "Do you have any existing data contracts, data product SLAs, or schema specifications? I can read PDFs, YAML, and JSON files directly. Place them in `projects/{project-dir}/external/` and re-run, or skip."
- **Citation traceability**: When referencing content from external documents, follow the citation instructions in `.arckit/references/citation-instructions.md`. Place inline citation markers (e.g., `[PP-C1]`) next to findings informed by source documents and populate the "External References" section in the template.

### Step 5: Gather Context from Existing Artifacts

**IF `ARC-*-DATA-*.md` exists in the project**:

- Read `{project_path}/ARC-*-DATA-*.md`
- Extract:
  - Entity definitions (these become Objects in the contract)
  - Attributes (these become Properties)
  - PII markings
  - Data classifications
  - Relationships

**IF `ARC-*-REQ-*.md` exists**:

- Read `{project_path}/ARC-*-REQ-*.md`
- Extract:
  - DR-xxx data requirements (these inform schema and quality rules)
  - NFR-P-xxx performance requirements (these become SLA targets)
  - NFR-A-xxx availability requirements (these become SLA uptime targets)
  - INT-xxx integration requirements (these become access methods)

**IF `ARC-*-STKE-*.md` exists**:

- Read `{project_path}/ARC-*-STKE-*.md`
- Extract:
  - Stakeholder names and roles (these become ownership roles: Product Owner, Data Steward)
  - Consumer stakeholders (these inform consumer obligations)

**IF `projects/000-global/ARC-000-PRIN-*.md` exists**:

- Read it to understand mesh governance standards
- Look for principles about:
  - Federated ownership
  - Data as a product
  - Self-serve infrastructure
  - Computational governance
  - UK Government compliance (TCoP, GDPR)

### Step 6: Generate the Data Mesh Contract

Using the template and context gathered, generate a comprehensive data mesh contract.

**Key Sections to Populate**:

1. **Document Information**:
   - Document ID: `ARC-{project_id}-DMC-{NNN}-v1.0` (multi-instance type, uses sequential numbering)
   - Project: `{project_name}` (Project {project_id})
   - Classification: Determine based on PII content (OFFICIAL-SENSITIVE if PII, OFFICIAL otherwise)
   - Version: Start at `1.0`
   - Status: DRAFT (for new contracts)
   - Date: Today's date (YYYY-MM-DD)
   - Owner: If stakeholder analysis exists, use Product Owner; otherwise use placeholder

2. **Fundamentals (Section 1)**:
   - Contract ID: Generate a UUID
   - Contract Name: `{data-product-name}`
   - Semantic Version: Start at `1.0.0`
   - Status: ACTIVE (for published) or DRAFT (for new)
   - Domain: Extract from project name or ask user (e.g., "customer", "seller", "finance")
   - Data Product: The data product name
   - Tenant: Organization name (if known from stakeholders, otherwise placeholder)
   - Purpose: Describe what this data product provides
   - Ownership: Map from ARC-*-STKE-*.md if available

3. **Schema (Section 2)**:
   - **If ARC-*-DATA-*.md exists**: Map entities → objects, attributes → properties
   - **If ARC-*-DATA-*.md does NOT exist**: Create schema from scratch based on user description
   - For each object:
     - Object name (e.g., CUSTOMER, TRANSACTION)
     - Source entity reference (if from ARC-*-DATA-*.md)
     - Object type (TABLE, DOCUMENT, FILE, STREAM)
     - Properties table with: name, type, required, PII, description, business rules
     - Primary keys, foreign keys, indexes
   - Schema versioning strategy: Semantic versioning
   - Breaking change policy: 90-day deprecation notice

4. **Data Quality (Section 3)**:
   - Quality dimensions: Accuracy, Validity, Completeness, Consistency, Timeliness, Uniqueness
   - **If ARC-*-REQ-*.md exists**: Map DR-xxx requirements to quality rules
   - Automated quality rules in ODCS format:
     - null_check for required fields
     - uniqueness for primary keys
     - referential_integrity for foreign keys
     - regex for format validation (email, phone)
     - range for numeric constraints
   - Monitoring dashboard (placeholder URL)
   - Alert thresholds

5. **Service-Level Agreement (Section 4)**:
   - **If ARC-*-REQ-*.md has NFR-A-xxx**: Use those as uptime targets (default: 99.9%)
   - **If ARC-*-REQ-*.md has NFR-P-xxx**: Use those as response time targets (default: <200ms p95)
   - Freshness targets: <5 minutes for real-time, <1 hour for near-real-time, daily for batch
   - Data retention: Map from ARC-*-DATA-*.md if available, otherwise use defaults (7 years for transactional, 90 days for logs)
   - Support SLA: Critical <30min, High <4 hours, Normal <1 day

6. **Access Methods (Section 5)**:
   - **If ARC-*-REQ-*.md has INT-xxx**: Use those to define access patterns
   - Default access methods:
     - REST API (for queries)
     - SQL Query (for analytics)
     - Data Lake (for batch processing)
   - API specifications: endpoints, authentication (OAuth 2.0 / API Key), rate limits
   - Consumer onboarding process

7. **Security and Privacy (Section 6)**:
   - Data classification: Based on PII content
   - Encryption: AES-256 at rest, TLS 1.3 in transit
   - Access controls: RBAC with roles (Consumer-Read, Analyst-FullRead, Admin)
   - **GDPR/UK GDPR Compliance**:
     - PII inventory from ARC-*-DATA-*.md or schema
     - Legal basis: CONTRACT / LEGITIMATE_INTEREST / CONSENT
     - Data subject rights: API endpoint for access/rectification/erasure
     - Cross-border transfers: Default to UK (London region)
     - DPIA status: REQUIRED if PII exists, NOT_REQUIRED otherwise
   - Audit logging: All API access, schema changes, PII access

8. **Governance and Change Management (Section 7)**:
   - Change request process: Minor (7 days notice), Major (90 days notice)
   - Contract review cycle: Quarterly
   - Deprecation policy: 90-day notice + migration support

9. **Consumer Obligations (Section 8)**:
   - Attribution requirements
   - Usage constraints (no redistribution, no reverse engineering)
   - Data quality feedback
   - Compliance with own GDPR obligations
   - Security (protect credentials, rotate keys)

10. **Pricing (Section 9)**:
    - Default: FREE tier for internal consumers
    - Optional: Cost recovery model (per request, per GB)
    - If external consumers: Consider commercial pricing

11. **Infrastructure (Section 10)**:
    - Cloud provider: AWS (default for UK Gov) / Azure / GCP
    - Region: UK (London) - eu-west-2
    - High availability: Multi-AZ
    - DR: RTO 4 hours, RPO 15 minutes
    - Infrastructure components: API Gateway, Compute (Lambda/ECS), Database (RDS), Cache (Redis), Storage (S3)

12. **Observability (Section 11)**:
    - Key metrics: request rate, error rate, latency, freshness, quality, cost
    - Alerts: Error rate >1%, Latency >500ms, Freshness >5min
    - Logging: 30 days hot, 1 year cold

13. **Testing (Section 12)**:
    - Test environments: Dev, Staging, Production
    - Sample test cases for consumers
    - CI/CD integration

14. **Limitations (Section 13)**:
    - Document any known limitations
    - Known issues table

15. **Roadmap (Section 14)**:
    - Upcoming features (next 6 months)
    - Long-term vision

16. **Related Contracts (Section 15)**:
    - **If INT-xxx requirements exist**: List upstream dependencies
    - List known downstream consumers (from stakeholders if available)

17. **Appendices (Section 16)**:
    - ODCS YAML export
    - Changelog (version history)
    - Glossary
    - Contact information

### Step 7: Construct Document ID

- **Document ID**: `ARC-{PROJECT_ID}-DMC-{NNN}-v{VERSION}` (e.g., `ARC-001-DMC-001-v1.0`)
- Sequence number `{NNN}`: Check existing files in `data-contracts/` and use the next number (001, 002, ...)

---

**CRITICAL - Auto-Populate Document Control Fields**:

Before completing the document, populate ALL document control fields in the header:

**Construct Document ID**:

- **Document ID**: `ARC-{PROJECT_ID}-DMC-{NNN}-v{VERSION}` (e.g., `ARC-001-DMC-001-v1.0`)

**Populate Required Fields**:

*Auto-populated fields* (populate these automatically):

- `[PROJECT_ID]` → Extract from project path (e.g., "001" from "projects/001-project-name")
- `[VERSION]` → "1.0" (or increment if previous version exists)
- `[DATE]` / `[YYYY-MM-DD]` → Current date in YYYY-MM-DD format
- `[DOCUMENT_TYPE_NAME]` → "Data Mesh Contract"
- `ARC-[PROJECT_ID]-DMC-v[VERSION]` → Construct using format above
- `[COMMAND]` → "arckit.data-mesh-contract"

*User-provided fields* (extract from project metadata or user input):

- `[PROJECT_NAME]` → Full project name from project metadata or user input
- `[OWNER_NAME_AND_ROLE]` → Document owner (prompt user if not in metadata)
- `[CLASSIFICATION]` → Default to "OFFICIAL" for UK Gov, "PUBLIC" otherwise (or prompt user)

*Calculated fields*:

- `[YYYY-MM-DD]` for Review Date → Current date + 30 days

*Pending fields* (leave as [PENDING] until manually updated):

- `[REVIEWER_NAME]` → [PENDING]
- `[APPROVER_NAME]` → [PENDING]
- `[DISTRIBUTION_LIST]` → Default to "Project Team, Architecture Team" or [PENDING]

**Populate Revision History**:

```markdown
| 1.0 | {DATE} | ArcKit AI | Initial creation from `$arckit-data-mesh-contract` command | [PENDING] | [PENDING] |
```

**Populate Generation Metadata Footer**:

The footer should be populated with:

```markdown
**Generated by**: ArcKit `$arckit-data-mesh-contract` command
**Generated on**: {DATE} {TIME} GMT
**ArcKit Version**: {ARCKIT_VERSION}
**Project**: {PROJECT_NAME} (Project {PROJECT_ID})
**AI Model**: [Use actual model name, e.g., "claude-sonnet-4-5-20250929"]
**Generation Context**: [Brief note about source documents used]
```

---

Before writing the file, read `.arckit/references/quality-checklist.md` and verify all **Common Checks** plus the **DMC** per-type checks pass. Fix any failures before proceeding.

### Step 8: Write the Contract File

**IMPORTANT**: Use the **Write tool** to create the file. Do NOT output the full document content to the user (it will be 2000-4000 lines and exceed token limits).

```text
Write tool:
  file_path: {project_path}/data-contracts/ARC-{PROJECT_ID}-DMC-{NNN}-v1.0.md
  content: {full contract content}
```

Note: Use the constructed document ID format for the filename.

### Step 9: Show Summary to User

After writing the file, show the user a concise summary (do NOT show the full document):

```text
✅ Data Mesh Contract Generated

**Contract**: ARC-{PROJECT_ID}-DMC-{NNN}-v1.0.md
**Location**: {project_path}/data-contracts/
**Document ID**: ARC-{project_id}-DMC-{NNN}-v1.0
**ODCS Version**: v3.0.2
**Contract Version**: 1.0.0
**Status**: DRAFT

---

## Contract Summary

**Data Product**: {data_product_name}
**Domain**: {domain_name}
**Purpose**: {brief_purpose}

### Schema
- **Objects**: {N} objects defined
- **Properties**: {M} total properties
- **PII Fields**: {P} fields contain PII

### SLA Commitments
- **Availability**: {99.9%} uptime
- **Response Time (p95)**: {<200ms}
- **Freshness**: {<5 minutes}
- **Data Retention**: {7 years}

### Quality Rules
- {N} automated quality rules defined
- Quality dimensions: Accuracy, Validity, Completeness, Consistency, Timeliness, Uniqueness

### Access Methods
- REST API: {endpoint}
- SQL Query: {database.schema.table}
- Data Lake: {s3://bucket/prefix}

### Security
- Classification: {OFFICIAL-SENSITIVE} (contains PII)
- Encryption: AES-256 at rest, TLS 1.3 in transit
- Access Control: RBAC (Consumer-Read, Analyst-FullRead, Admin)
- GDPR Compliant: ✅

### Governance
- Change Process: Minor (7 days notice), Major (90 days notice)
- Review Cycle: Quarterly
- Deprecation Policy: 90-day notice + migration support

---

## Next Steps

1. **Review Contract**: Open the file and customize placeholders ({...})
2. **Domain Team Review**: Product Owner should review all sections
3. **DPO Review** (if PII): Ensure GDPR compliance is accurate
4. **Security Review**: Validate encryption and access controls
5. **Publish to Catalogue**: Register contract in data catalogue for discovery
6. **Consumer Onboarding**: Set up sandbox environment for consumers to test

## Related Commands

- `$arckit-traceability` - Link this contract to requirements and consumers
- `$arckit-analyze` - Score contract completeness and governance quality
- `$arckit-dpia` - Generate Data Protection Impact Assessment (if PII present)

---

**Full contract**: `{project_path}/data-contracts/ARC-{PROJECT_ID}-DMC-{NNN}-v1.0.md` ({line_count} lines)
```

### Step 10: Post-Generation Recommendations

Based on what artifacts exist, recommend next steps:

**If no ARC-*-REQ-*.md**:

```text
💡 Tip: Run $arckit-requirements to capture DR-xxx data requirements.
   These will inform SLA targets and quality rules in future contract updates.
```

**If no ARC-*-STKE-*.md**:

```text
💡 Tip: Run $arckit-stakeholders to identify domain owners and consumers.
   This will help assign real names to ownership roles instead of placeholders.
```

**If PII exists but no ARC-*-DPIA-*.md**:

```text
⚠️  This contract contains PII ({N} fields marked as PII).

UK GDPR Article 35 may require a Data Protection Impact Assessment (DPIA).

Consider running: $arckit-dpia Generate DPIA for {project_name}
```

**If this is a UK Government project**:

```text
💡 UK Government Alignment:
   - Technology Code of Practice: Point 8 (Share, reuse and collaborate) ✅
   - National Data Strategy: Pillar 1 (Unlocking value) ✅
   - Data Quality Framework: 5 dimensions covered ✅

Consider running:
   - $arckit-tcop - Technology Code of Practice assessment
   - $arckit-service-assessment - GDS Service Standard (if digital service)
```

## Important Notes

1. **Token Limit Handling**: ALWAYS use the Write tool for the full document. NEVER output the complete contract to the user (it's 2000-4000 lines). Only show the summary.

2. **ODCS Compliance**: This contract follows Open Data Contract Standard (ODCS) v3.0.2. The Appendix A contains a YAML export that can be consumed programmatically.

3. **UK Government Context**: If this is a UK Government project, ensure:
   - Data stored in UK (London region)
   - UK GDPR compliance
   - Technology Code of Practice alignment
   - National Data Strategy alignment
   - Data Quality Framework coverage

4. **Traceability**: The contract links to:
   - Source data model (entities → objects)
   - Requirements (DR-xxx → quality rules, NFR-xxx → SLAs)
   - Stakeholders (→ ownership roles)
   - Architecture principles (→ governance standards)

5. **Versioning**: Use semantic versioning (MAJOR.MINOR.PATCH):
   - MAJOR: Breaking changes (remove field, change type)
   - MINOR: Backward-compatible additions (new field, new object)
   - PATCH: Bug fixes (data quality fixes, documentation)

6. **Federated Ownership**: The domain team owns this contract end-to-end. They are responsible for:
   - SLA adherence
   - Quality monitoring
   - Consumer support
   - Schema evolution
   - Change management

7. **One Contract Per Product**: Don't bundle unrelated datasets. Each domain data product should have its own contract.

8. **Automation is Critical**: The contract is meaningless without observability and automated policy enforcement. Ensure:
   - Quality rules are executable by data quality engines
   - SLA metrics are monitored in real-time
   - Access controls are enforced via API gateway
   - Audit logs are captured automatically

- **Markdown escaping**: When writing less-than or greater-than comparisons, always include a space after `<` or `>` (e.g., `< 3 seconds`, `> 99.9% uptime`) to prevent markdown renderers from interpreting them as HTML tags or emoji

## Example User Interactions

**Example 1: Simple contract creation**

```text
User: $arckit-data-mesh-contract Create contract for customer payments
Assistant:
  - Checks prerequisites ✅
  - Creates project 001-customer-payments
  - Finds ARC-*-DATA-*.md with CUSTOMER and TRANSACTION entities
  - Generates contract mapping entities → objects
  - Shows summary (not full document)
```

**Example 2: Contract without data model**

```text
User: $arckit-data-mesh-contract seller-analytics contract
Assistant:
  - Checks prerequisites ✅
  - Warns: No data model found
  - User confirms to proceed
  - Generates contract with generic schema placeholders
  - Recommends running $arckit-data-model first
```

**Example 3: Contract with full context**

```text
User: $arckit-data-mesh-contract fraud-detection-features
Assistant:
  - Checks prerequisites ✅
  - Finds ARC-*-DATA-*.md, ARC-*-REQ-*.md, ARC-*-STKE-*.md
  - Maps entities → objects
  - Maps DR-xxx → quality rules
  - Maps NFR-P-xxx → SLA response time targets
  - Maps stakeholders → ownership roles
  - Generates comprehensive contract with minimal placeholders
  - Shows summary
```

## Error Handling

**If architecture principles don't exist**:

- Stop and tell user to run `$arckit-principles` first
- Do NOT proceed without principles

**If user provides unclear data product name**:

- Ask for clarification: "What is the name of the data product?"
- Suggest examples: customer-payments, seller-analytics, order-events

**If project creation fails**:

- Show error message from create-project.sh
- Ask user to check permissions or directory structure

**If template file is missing**:

- Error: "Template not found: .arckit/templates/data-mesh-contract-template.md"
- Ask user to check ArcKit installation

**If file write fails**:

- Show error message
- Check if directory exists
- Check permissions
More from tractorjuice/arc-kit