Data Product Lifecycle

Architecture Strategy

Data Product Design and Lifecycle

Beyond Datasets to Fully Managed Data Products.

Executive Summary

Organizations that "publish datasets" often operate in an implicit-contract world: the producer changes column names or semantics, consumers discover breakage downstream, and ownership is unclear. Data Mesh reframes this by treating analytical data as a product and data consumers as customers, requiring product-level capabilities like discoverability, trustworthiness, and security even as data ownership decentralizes across domains.

A fully managed data product is best understood as an independently operable unit (an architectural "quantum") that bundles: the data itself, transformation code, infrastructure declarations, metadata, governance policies, and operational guarantees.

A practical, interoperable way to "package" these expectations is via a data product contract: Bitol's Open Data Contract Standard (ODCS) explicitly models fundamentals, schema, data quality, support/communication, pricing, roles/team, infrastructure/servers, and SLAs as first-class contract sections.

Six Capabilities Separating Datasets from Data Products

Contracts & Schemas

Explicit, machine-validatable definitions of interface, semantics, and quality rules.

Versioning & CI/CD

A release discipline that encodes compatibility promises and automated validation gates.

SLAs & SLOs

Reliability engineering applied to data: SLIs, target SLOs, consumer-facing SLAs.

Discoverability

Standardized metadata and lineage so products are findable and impact analysis is possible.

Deprecation & Lifecycle

Explicit states, timelines, communications, and compliance-grade retention controls.

Product Boundaries

Boundaries aligned to business domains (DDD) with clear accountability and cost visibility.

1. Data Product Contracts and Schemas

Recommended Practices

A robust data product contract should specify more than "columns and types." ODCS provides a useful canonical checklist. Structure contracts as layered commitments:

  • Interface contract: Names, types, nullability, keys, partitions, and serialization format.
  • Semantic contract: Grain, meaning, units, canonical definitions, aligned with Domain-Driven Design (DDD).
  • Quality contract: Rules and thresholds (uniqueness, completeness). Tools like Deequ formalize "unit tests for data".
  • Operational contract: Freshness, availability, delivery schedule, and support channels.
  • Security & governance contract: Classification, access method, authZ model, and terms of use.
  • Economics contract: Pricing/showback inputs and cost allocation rules.

Contract Types and Trade-offs

Contract Approach What it optimizes Strengths Trade-offs / risks
Documentation-only "data dictionary" Understanding Low effort; easy to start Breakage still discovered late; drift between docs and reality
Machine-validatable schema contract Structural correctness Prevents schema drift; automatable Doesn't guarantee semantics; may overfit to types
Full data product contract (ODCS-style) Product reliability & governance Aligns schema, quality, SLA, ownership, support, pricing Higher up-front investment; requires operating model maturity

Concrete Example (ODCS-style snippet)

orders_contract.yaml
apiVersion: v3.1.0
kind: DataContract
name: orders
status: active
domain: commerce.orders
dataProduct: Orders
schema:
  - name: orders
    type: table
    description: "One row per customer order (grain = order_id)."
    columns:
      - name: order_id
        dataType: string
        required: true
      - name: total_amount
        dataType: decimal(12,2)
        required: true
dataQuality:
  rules:
    - type: uniqueness
      column: order_id
sla:
  freshness:
    maxLagMinutes: 15
support:
  channel: "#orders-data-product"

Implementation Checklist

Case Study: PayPal's open-source data contract template evolved into ODCS (Open Data Contract Standard), showing how solving cross-team contract friction leads to community standardization and better validation tooling.

2. Versioning and CI/CD for Data Products

Data products need versioning at multiple conceptual layers: Contract/Interface (SemVer), Implementation (Git hash), and Data Content (CalVer/Time-based).

Migration Patterns

  • Additive evolution: Add nullable fields; widen types.
  • Compatibility views: Preserve old interfaces while stabilizing new ones.
  • Dual-publish: Publish v1 and v2 concurrently for a migration window.
  • Shadow pipelines: Compute v2 in parallel, compare, then flip.

Data CI/CD Gates

  • Contract linting / validation.
  • Schema compatibility checks (registry rules).
  • Data unit tests & invariants running in staging.
  • Metadata + lineage emission verification.

Implementation Checklist

3. Data Product SLAs and SLOs

Adopt SRE terminology for data reliability: An SLO is a target value measured by an SLI. An SLA is the consumer-facing agreement with consequences.

Metric Category Example SLI Target SLO
Freshness / Timeliness lag(now, max(event_time)) p95 ≤ 15 min
Availability % successful reads 99.9% / 30d
Validity % rows passing constraints ≥ 99.99%
Completeness % expected entities present ≥ 99.5%

Alerting and Enforcement

Page on sustained SLO burn (error budget consumption) rather than one-off anomalies. Implement automated remediation runbooks (rollback, replay, or degrade gracefully).

Implementation Checklist

4. Discoverability and Metadata Standards

Discoverability is foundational to "data as a product" at scale. Metadata must be both a standard model for tool interoperability (DCAT, PROV-DM, OpenLineage) and an operational system (DataHub, OpenMetadata).

Minimum Viable Metadata Record:

  • Identity: name, domain, unique ID, stable address.
  • Contract: format, version, compatibility mode.
  • Semantics: grain, key fields, units, glossary mappings.
  • Ownership: owner, steward, on-call support channel.
  • Access: auth method, classification, allowed uses.
  • Reliability: SLO definitions + current SLI rollups.
  • Lineage: upstream sources, downstream dependents.

Implementation Checklist

5. Deprecation and Lifecycle Management

Without explicit lifecycle policies, deprecated products linger, consuming cost and creating compliance risks. Lifecycle transitions are most effective when enforced computationally.

Data Product Lifecycle Map

stateDiagram-v2 [*] --> Proposed Proposed --> Draft: contract + initial schema Draft --> Active: passes CI + SLOs + metadata Active --> Active: backward-compatible evolution Active --> Deprecated: breaking change planned Deprecated --> Retired: deprecation window elapsed Retired --> Archived: final snapshot stored Archived --> [*] Draft --> Proposed: re-scope / rejected Deprecated --> Active: deprecation reverted

Implementation Checklist

6. Product Boundaries and Ownership

Defining "what is a data product" requires balancing cohesion and coupling. Bounding aligns closely with Domain-Driven Design (DDD) bounded contexts and Conway's Law.

Ownership Boundary Flowchart

flowchart TB subgraph DomainA[Domain Team: Orders] A1[Owns Orders Data Product\n- Contract ODCS\n- Schema & Semantics\n- SLOs & On-call] end subgraph DomainB[Domain Team: Payments] B1[Owns Payments Data Product\n- Contract\n- Quality & SLAs\n- Support] end subgraph Platform[Self-Serve Data Platform Team] P1[Golden Paths / CI templates] P2[Storage / Compute Provisioning] P3[Lineage & Metadata Infra] P4[Policy Enforcement Hooks] end subgraph Gov[Federated Governance Council] G1[Global Standards\n- Metadata fields\n- Naming conventions\n- Security rules] end A1 -->|Publishes metadata| P3 B1 -->|Publishes metadata| P3 Platform -->|Enables autonomy| DomainA Platform -->|Enables autonomy| DomainB Gov -->|Defines standards| Platform Gov -->|Audits compliance| DomainA Gov -->|Audits compliance| DomainB

Implementation Checklist

References & Core Standards
Data Mesh & Boundaries:
Data Mesh Principles (Fowler), DDD Reference
Data Contracts & Schemas:
Open Data Contract Standard (ODCS), AWS Deequ
Metadata, Discovery, Lineage:
W3C DCAT v3, OpenLineage