Data Products 101

Building Data Assets for Business Impact

From Data as Resource to Data as Product

Data Product Agenda

Data Product Agenda

Data as Product Combining data, logic, and delivery into a single, scalable artifact empowers businesses to transition from passive data observation to active intelligence utilization. This in-depth guide explores the importance, functionality, deployment, and practical application of data products, revolutionizing how organizations harness their data resources.

The Four Core Topics

01. Why Data As Product

Recognizing the importance of using data products to stay competitive in today's business environment.

02. What is Data As Product

Defining data products, including their components, characteristics, and what sets them apart from conventional data projects.

03. Drivetrain Approach

An organized approach to creating scalable data products by utilizing the drivetrain framework and adhering to lifecycle discipline.

04. Dataknobs Approach

The detailed Dataknobs approach transforms data into valuable business assets that enhance user outcomes.

01. Why Data As Product Matters Now

Why Data As Product

The modern business landscape has undergone a fundamental shift, with data products now catering to the changing dynamics: AI reducing costs, the need for quick decision-making, and a significant increase in data volumes. To create business value, organizations must adopt a new approach to leveraging data effectively.

The Business Case: Three Drivers

1. AI Has Lowered Cost

The cost of developing intelligence systems has significantly decreased due to advancements in machine learning and AI. What used to be a substantial investment is now within reach for organizations of any scale. However, the key to seizing this opportunity lies in efficiently managing data.

2. Decisions Must Be Faster

The speed of decision-making cycles has increased due to market dynamics, competitive pressures, and customer expectations. Organizations that can quickly transform data into insights in hours or minutes, rather than weeks or months, gain a competitive edge. The consequences of delayed decisions are now too costly.

3. Data Volume Has Exploded

The amount, range, and speed of data have surged dramatically. Outdated data management methods struggle to handle the sheer volume, prompting organizations to seek scalable, distributed systems tailored for large-scale data handling.

Why This Matters: Three Strategic Shifts

Shift 1: From Reports to Outcomes

Stop relying on dashboards and reports for information; instead, focus on creating systems that generate tangible business results. Data products should have a direct impact on improving key business metrics.

Shift 2: From Dashboards to Workflows

Eliminate the need for humans to consult dashboards, make decisions, and take action. Integrate intelligence seamlessly into workflow systems for automatic or low-friction decision-making.

Shift 3: From Analytics Projects to Product Lifecycle

Shift away from project-based thinking that treats analytics as a one-time endeavor. Embrace a product mindset focused on ongoing enhancement, version updates, and disciplined lifecycle management.

The Core Insight

Data products integrate several signals, utilize intelligence, and produce coherent, actionable results. They are more than just data; they are the outcome of integrating, processing, and presenting data to directly meet user requirements.

02. What is Data As Product

What is Data As Product

A data product goes beyond being a mere database table. A consumer-focused package that contains a dataset along with metadata, semantics, and code required for discovering, understanding, accessing, and trusting it.

Key Characteristics of Data Products

🎯 Consumer Oriented

Created with a focus on addressing individual user needs, this product is meticulously crafted with the user's experience in mind, rather than the perspective of the data engineer.

📦 Self-Contained

Comprised of code, tests, infrastructure-as-code, and access policies, all necessary components for using and maintaining the product are bundled together.

🔒 Governed By Design

Quality and security are inherent, not added as an afterthought. Governance is integrated into the product through code and automation.

The DATSIS Framework: Six Essential Attributes

Data Product Attributes DATSIS
D
Discoverable
Listed in a directory with detailed information on ownership, lineage, and samples for easy consumer access.
A
Addressable
Accessible via a unique, stable programmatic address (URI) for automation.
T
Trustworthy
Quality is measured (SLIs/SLOs) and truthful, adhering to ISO standards.
S
Self-Describing
Comprises schemas, documentation, and semantics that can be comprehended without consulting the author.
I
Interoperable
Follows global standards (like ODCS/ODPS) to work across different systems.
S
Secure
Access control (RBAC/ABAC) and privacy policies are enforced by code.

Data Products vs Software Products

Software Product vs Data Products
Dimension Software Product Data Product Delivers Features Insight Focus Software lifecycle Data lifecycle Perspective How software capabilities are used by multiple customers How data is used in multiple use cases Innovation Team delivers new capabilities by writing code Team delivers new capabilities by enriching data Success Metric Feature adoption and engagement Data usage and business outcome improvement

Data As Product Mindset

Data As Product Mindset

Transitioning to data products necessitates a shift in mindset from project-based thinking to product-based thinking. This means organizations must go from focusing on delivering a single initiative to serving multiple consumers over time with ongoing enhancements.

Project Mindset vs Product Mindset

❌ Project Mindset

  • Goal: Deliver specific data signal/scope one time
  • Access: Siloed, ticket-based
  • Success: On-time/on-budget, requirements "done"
  • Metrics: Throughput, milestones, tickets closed
  • Change: Seen as scope creep
  • Risk: Brittle pipelines, unclear ownership, untrusted data

✓ Product Mindset

  • Goal: Serve multiple consumers over time
  • Access: Self-serve via API catalog
  • Success: Adoption, satisfaction, outcomes
  • Metrics: Usage, retention, data quality SLOs, time-to-insight
  • Change: Expected::managed via versioning and contracts
  • Benefit: Clear ownership, reliability, trust, compounding improvements
✓ Making the Mindset Shift
  • Stop thinking about single-use reports and dashboards
  • Focus on reusability across multiple use cases
  • Invest in infrastructure for self-service access
  • Define clear ownership and accountability
  • Establish SLOs for data quality and availability
  • Build feedback loops to understand user needs
  • Plan for versioning and backwards compatibility

Data Product Experimentation Framework

Data Product Experimentation
Data Product Experimentation Details

Data products must undergo validation on two fronts: ensuring the algorithm functions correctly and determining if users find value in it. Successful data product teams are able to separate these validation processes while running them simultaneously.

Two Parallel Learning Loops

🔧 Intelligence Engine

Focus: Technical Validity

Question: Is the model effective in generating accurate, dependable, and scalable data signals?

Responsibility: Determine if the system can generate accurate, dependable, and scalable data signals that align with technical requirements.

  • Model accuracy and performance
  • Data quality and reliability
  • System scalability and latency
  • Infrastructure stability

💼 Value Engine

Focus: Market Validity

Question: Does anyone care? Will the output meaningfully improve users' workflows?

Responsibility: Evaluate if the outputs significantly enhance users' lives or workflows and address real business challenges.

  • User adoption and engagement
  • Business impact and ROI
  • Customer satisfaction
  • Problem-solution fit

Key Insight: Parallel Progress

Instead of waiting for perfect technical accuracy, test market validity by running validation loops in parallel. Create an MVP that is technically sound and gathers user feedback early to avoid building solutions that are perfectly accurate but unwanted.

03. The Drivetrain Approach

Drivetrain Approach

The Drivetrain Approach offers a systematic method for creating scalable data products by integrating goals, controls, data, and models into a cohesive framework that guarantees tangible business results.

Four Components of the Drivetrain

1
Define Objective

What specific business objectives, user needs, and success criteria are you aiming to achieve with this project? These factors drive all other decisions within the framework.

2
Identify Levers (Knobs)

Which inputs are within your control? What variables can be adjusted by the system to impact outcomes? These decisions are the key drivers of results.

3
Gather Data

What types of data can you gather? Determine the necessary sources of data to comprehend the connections between different factors and results.

4
Build Models

How do levers and knobs impact the results? Create models that can comprehend and forecast connections between actions and results.

Operating Model & Governance

Operating Model and Governance

The way in which data products are owned, governed, and operated is determined by the operating model. There are three different approaches available to organizations, ranging from centralized to fully distributed.

A. Centralized

A centralized team creates and manages curated datasets, with consumers submitting change requests through tickets. Ideal for smaller organizations or early-stage projects, offering straightforward governance but with restricted scalability.

B. Hub & Spoke

Key products are owned by domain teams, while a small central team establishes standards (known as the 'metamodel') and offers platform infrastructure, striking a balance between autonomy and consistency. Recommended for most organizations.

C. Full Data Mesh

Ownership in the domain is scaled, with products being viewed as architectural quanta. Governance is completely decentralized and relies on computational systems. This necessitates a high level of organizational maturity.

Data Product Lifecycle

Data Product Lifecycle

Data products undergo a methodical process from conception to implementation, with distinct milestones, involved parties, and benchmarks for success. This systematic method guarantees the development of effective products that provide tangible benefits to the business.

Five Phases of the Data Product Lifecycle

1
Discovery

Pinpoint user pain points before developing based on data alone. Conduct research to determine which problems your data can address and confirm that users are interested in resolving them.

2
Design

Establish APIs, schemas, and SLAs prior to code development. Establish agreements between producers and consumers. Document usage guidelines for the product.

3
Build

Develop pipelines, CI/CD processes, and unit tests for engineering projects. Ensure the data product is implemented using high-quality engineering standards. Prioritize building a robust infrastructure for maintaining quality

4
Launch

Ensure that users can easily access market, training, and documentation to discover and adopt the product, offering support and education as needed.

5
Iterate

Track usage data and adjust based on feedback. Enhance the product through ongoing analysis of usage patterns and user requirements.

✓ Lifecycle Best Practices
  • Spend adequate time in discovery phase::don't rush to build
  • Use contracts and schemas to prevent surprises
  • Treat build like production-grade software engineering
  • Invest in launch::a product nobody knows about has no impact
  • Establish feedback loops during iteration phase
  • Plan for versioning and backwards compatibility

04. The Dataknobs Approach

Dataknobs Approach

The Dataknobs Approach outlines six key principles to create data products that serve as long-lasting business assets, guiding decision-making at every stage of the product lifecycle.

Six Principles for Building Data Products

1
Start with Business Value, Not Data

Start by identifying a business problem or user need, instead of starting with 'we have data.' Many organizations make the mistake of creating data products based on the data they have, rather than focusing on addressing the needs of their users. Shift your perspective.

2
Focus on User and Task

Gain a thorough understanding of the individuals responsible for specific tasks and their motivations. Develop the product with a focus on the user's environment, processes, and limitations. Prioritizing user needs is key in design.

3
Design as Product, Not Pipeline

Consider the product encounter, not solely data engineering. How will users find it? Reach it? Comprehend it? Rely on it? Craft each element with the user at the forefront.

4
Engineer for Trust

Incorporate interoperability, reusability, and quality into the product by adhering to standards, documenting processes, and monitoring implementation. Consistency and reliability are key in earning trust.

5
Operate with Lifecycle Discipline

Adhere to the formal lifecycle process without skipping phases. Incorporate versioning, SLOs, and change management. View the product as enduring, not temporary.

6
Enable the Right Operating Model

Select an operational framework that aligns with your company's growth and size. Offer platform assistance for domain teams. Strong ownership and governance are key to achieving goals.

User Data and Task Framework

User Data and Task

Understanding how user, data, and task intersect is crucial in defining the data product and ensuring it effectively addresses real problems for people.

User-Centric Data Products Framework

User-Centric Data Products

Data products form a seamless value chain connecting user needs to business outcomes. Users engage with these products to extract insights, drive decisions, execute actions, and achieve results.

Dataknobs Core Principle

Dataknobs Core Principle

Know More, Risk Less, Do Better

This is the core of data products, allowing users to:

  • Reveal Reality: Provide intelligent signals that show what's actually happening
  • Reduce Uncertainty: Offer probabilities and predictions to reduce decision risk
  • Enable Comparison: Provide context and benchmarks to compare options
  • Predict Outcomes: Model future scenarios to anticipate consequences
  • Recommend Action: Suggest specific actions based on analysis

Building Gold Datasets & Data Assets

Building Higher Level Data Concepts
Building Gold Datasets

Creating successful data products relies on meticulous data engineering. It is essential to gather, refine, enhance, and regulate data in order to generate valuable resources that can be utilized for machine learning models and trusted by users.

Data Pipeline Best Practices

1. Create Data

Gather data from corporate sources, utilize web scraping, or partner with third-party providers. Validate data sources for reliability and adhere to quality standards.

2. Augment & Transform

Leverage generative AI and various methods to enhance data quality. Implement changes that optimize data for modeling and analysis purposes.

3. Apply Privacy Controls

Anonymize sensitive data using privacy-preserving techniques to meet regulatory requirements and preserve data utility.

4. Compress Data

Efficiently compress data while maintaining the necessary signals for learning models. Find a balance between data size and retaining important information.

Data Concepts Hierarchy

Building from Raw Data to Features to Concepts

Creating top-notch data products involves transforming raw data into more advanced concepts. This structure allows for the ability to reuse and simplify information.

  • Raw Data (Petabytes): Unprocessed data from sources
  • Features (Terabytes): Engineered attributes computed from raw data
  • Feature Sets (Gigabytes): Curated collections of related features
  • Concepts (Megabytes): High-level, business-meaningful abstractions

Every level allows for sharing among various models, all while upholding semantic relevance and business context.

✓ Data Quality Best Practices
  • Define clear data quality metrics and SLOs
  • Implement automated data validation and testing
  • Monitor data quality continuously in production
  • Document data lineage and transformations
  • Version datasets for reproducibility
  • Build privacy and security into pipelines
  • Invest in data governance infrastructure

The Future of Data: Products Not Projects

Data products represent a fundamental shift Organizations harness the power of data by transitioning from isolated analytics endeavors to creating lasting data products, resulting in exponential value through improved quality, reuse, and ongoing enhancements.

Success requires more than technology. A shift in mindset is needed to view data as a valuable business asset that deserves the same level of attention as products. This shift also requires a strong commitment to user-centric design, quality standards, and lifecycle management within the organization. Additionally, investing in the right platforms is essential to empower domain teams to develop products autonomously.

Organizations that master data products will outcompete those that don't. Making quicker decisions, cutting expenses, enhancing customer satisfaction, and establishing competitive advantages through enhanced insights are all reasons why transitioning from viewing data as a resource to a product is imperative for success in today's data-driven market.