Data Products

This guide provides definitions and practical advice to help you understand and develop data products to unlock value from your data.

A diagram showing the flow of data from Data Producers to Data Consumers. In the center, it lists Domain Centric, Modular & Reusable, and Product Management within a Data Product Catalog.

What are data products?

Data products are highly trusted, re-usable, and consumable data assets. More specifically, they are curated collections of productized datasets and business-approved metadata and domain logic designed to solve domain-specific business outcomes. Data products also bring a product management approach to managing data and are the backbone of powerful data apps including recommendation engines, predictive models, dashboards, and APIs, and are used across industries to extract value from data.

Ultimately, they help you bridge the growing gap between data producers and data consumers to shorten the time-to-value for your use cases.

Characteristics

At a high level, a data product is a domain-specific, consumable entity aimed at transforming data into actionable insights for stakeholders and AI systems, enabling informed decision-making processes. The goal of data products is to make data accessible, consumable, insightful, and actionable for the increasing number of stakeholders and generative AI that rely on data to inform decision making. These products are created for a specific purpose and have defined and agreed-upon shapes, consumption interfaces, and maintenance and refresh cycles.

Diagram depicting the components and flows of data products, including metadata, privacy policies, data science, AI/ML, analysis, usage metrics, and legacy modernization, connected through various data sources.

Essentially, data products are like self-contained containers that directly address business problems internal to an organization or are monetized externally. They leverage data analysis, processing, and visualization techniques to generate meaningful insights or support AI and machine learning, presenting them in a way that is easy to use.

Here are the 6 characteristics of a well-designed data product:

  1. Prepared: Cleaned, transformed, high-quality data ready for analysis.

  2. Findable and Understandable: Metadata-driven, domain-centric assets built for effective use.

  3. Interoperable: Comprised of one or more datasets that work with each other to bring holistic, unbiased data insights.

  4. Shareable: Several datasets and data elements packed into a single trusted cohesive unit, making it easy to distribute.

  5. Accessible: Accessible to data consumers when needed in a standardized manner.

  6. Reusable: Built of composable elements that can be used to build several data products, as well as derivative data products.

A digital dashboard display with charts, graphs, and data visualizations. Additional sheets and icons are depicted in the background.

Data Products for AI

Learn how to use data products to deliver contextual data to large language models.

Components

Data products act as a bridge between data producers and data consumers. As a result, according to the Harvard Business Review, organizations that adopt the data product approach have the potential to accelerate the implementation process for new use cases by up to 90%.

Data producers work with data sets (which can be in a table, view, an ML model, or streamed) and the published data model, schema, and business-approved metadata. Your data will likely be from multiple data sources and can be raw data or transformed data.

Data consumers interact with data products via self-service but controlled access to data through visualization, APIs, SQL, and other secure options.

Hexagonal diagram labeled “Data Products” in the center. Data flows from “Data Producers” to “Data Consumers." Emphasis on domain-centric, modular and reusable design, and product management.

Data products are domain centric, modular and reusable, and are built by a designated owner using product management principles. A data products catalog organizes all data products within your organization. Here we describe each aspect in more detail:

Domain Centric. The domain model enhances data understanding and adds business context. It does this by housing the business logic of transformations, analytics calculations, metrics, and machine learning. It acts as a semantic layer, exposing the business-friendly domain data and insights, while abstracting the technical details.

Modular And Reusable. They are built once and then reused multiple times in various use cases, so new ones can be built using existing ones.

Product Management Principles. Each one is owned and managed by a domain-aligned data team that is responsible for its success in delivering value, satisfying and growing data users, and maintaining its lifecycle. And, every data product goes through versions and enhancements based on customer feedback.

Data Products Catalog. As we discuss in more detail below, this type of catalog is a foundational capability that lets both data producers and data consumers speak the same language. It serves as a centralized marketplace, providing detailed information about each data product, including its purpose, data sources, processing methods, and intended audience.

Examples of Data Products

Here are three top use case examples of data products:

Decision Support. GPS navigation applications serve as decision support data products. They provide real-time guidance to users, helping them make informed decisions about routes and directions.

Algorithms. One of the most well-known examples are recommendation engines used by companies like Netflix and Amazon to suggest products or content based on user behavior and preferences.

Automated Decision-Making. A self-driving car exemplifies automated decision-making. It relies on complex algorithms to make real-time decisions without external user intervention.

How to Build

Before you begin, you should establish a product management team, led by a data product manager, who steers the lifecycle of these products to meet business objectives. The team should also comprise analysts, data engineers, user experience designers, and data architects.

This team collaborates closely with data consumers to understand their requirements and translates those needs into actionable products. Plus, data products require ongoing management and continuous monitoring – just like real world products.

A circular diagram with "Data Product" in the center, connected to six outer nodes labeled "ID Users," "Detect Trends," "Collect Data," "Develop Model," "Validate Results," and "Deploy Solution.

Here we describe the following key steps of a data product lifecycle:

  1. Conceptualization: This stage involves identifying domain-specific business needs – understanding the users, their use cases and how they typically consume data.

  2. Design: This stage involves shaping the solution, which includes identifying datasets needed to address the identified need, the sources of that data, and defining the characteristics of the data product by functionality, interactivity, presentation approach, etc.

  3. Engineering: This stage involves building the technical infrastructure and functionalities of the data product. This includes gathering and integrating the necessary raw datasets from various sources, ensuring data quality, and the transformation processes.

  4. Deployment: This is the “making it accessible” stage, which involves ensuring data product documentation is complete, assigning ownership, integrating with existing business systems or consumption workflows for seamless access, and activating the data product essentially making it ready for consumption.

  5. Marketing: Activating the data product in the marketplace is just the first step. This stage focuses on creating awareness and promoting usage. This involves consumer training and setting up communication channels to capture consumer feedback.

  6. Usage: With a successful launch, consumers interact with the data product in a self-serve fashion to search, find, understand and gain access to the data product. This stage focuses on monitoring usage and gathering feedback. Typically, data is collected on how users interact with the data product, how the data product is performing, and whether a data product is meeting the domain-centric needs of the business.

  7. Maintenance: Data products are not static entities; they need to be improved to ensure continued value. Based on the user feedback collected and the evolving needs of the business, new data dimensions, features, and functionalities are proposed to be added into the next version of the data product.

Data Product Catalog

The gap between data producers and data consumers is only growing wider and, ultimately, this increases the time-to-value for your data. A data product catalog provides a comprehensive repository that organizes and documents various data products within your organization. It serves as a federated resource, providing detailed information about each data product, including its purpose, data sources, processing methods, and intended audience.

It facilitates efficient discovery, understanding, and utilization of your data assets, enabling you and other stakeholders to make informed decisions and leverage data effectively for business processes and analytics.

So, both a traditional data catalog and a data product catalog are needed in your enterprise data stack. Let’s compare them side by side:

Data Catalog

Data Product Catalog

Mission

Strengthen data governance

Increase data value

Audience

Data producers

Data producers and  consumers

Scope

Enterprise focused

Domain focused

Management

Centralized

Federated

Purpose

Discoverability, transparency

Usability, actionability

Content

Metadata of diverse kinds including technical, operational, and business

Trusted data products, lifecycle metadata, documentation, and ownership

Entities

Thousands to hundreds of thousands

Hundreds

Data Product vs Data as a Product

Sometimes people confuse these two concepts. While they share some similar aspects, they’re fundamentally different. In summary, “data as a product” emphasizes strategic importance within a data mesh, while “data products” refer to tangible outputs created from that data. Both concepts are essential for effective data management and utilization.

Let’s compare them in terms of monetization:

Data Products

Data as a Product

Internal marketplace: Here the focus is on delivering immediate value and streamlining data consumption for inter and intra-company data and analytical stakeholders.

External transactional marketplace: In this paradigm, monetization happens by selling access to data products, and licensing them externally for commercial use, or offering data-driven services to clients.

For Data Mesh and Data Fabric

Data mesh and data fabric, though distinct in their approaches to data management, can together form a powerful synergy to unlock the full potential of data products. Let’s discuss them one at a time:

Data Mesh refers to a data architecture where data is owned and managed by the teams that use it. A data mesh decentralizes data ownership to business domains–such as finance, marketing, and sales–and provides them a self-serve data platform and federated computational governance. This allows different domains to develop, deploy, and operate data services more autonomously and model their data based on their specific needs while also ensuring a consistent and unified data experience across your organization.

Diagram showing how a data mesh takes data from multiple domains and provides actionable insights to app and analytics tools.

As you can see above, data products are the most tangible and impactful aspect of the data mesh paradigm.

Data Fabric refers to a machine-enabled data integration architecture that utilizes metadata assets to unify, integrate, and govern disparate data environments. By standardizing, connecting, and automating data management practices and processes, data fabrics improve data security and accessibility and provide end-to-end integration of data pipelines and on premises, cloud, hybrid multicloud, and edge device platforms.

Diagram showing a data fabric architecture where data from operational sources is leveraged for BI, Analytics and Data Science.

Data fabric is the foundational data management architecture that enables optimal delivery of data products to domain teams, facilitating improved DataOps.

Learn more about data integration with Qlik