What it is, why it matters, and best practices. This guide provides definitions, frameworks, and practical advice to help you understand and perform modern data governance.
Data governance refers to the set of roles, processes, policies and tools which ensure proper data quality throughout the data lifecycle and proper data usage across an organization. Data governance allows users to more easily find, prepare, use and share trusted datasets on their own, without relying on IT.
Why is it Important?
The primary benefit of data governance is providing the high-quality data necessary for data analytics and BI tools. The insights gained from these tools result in better business decisions and improved performance. Additional benefits include:
Improved data accuracy, completeness, and consistency
Prevention of data misuse
Agreement on common data definitions
Removal of data silos between departments and systems
Increased trust in data for analytics and decision making
Easier to locate data making all data more available
Better compliance with data privacy laws and other government regulations such as the EU General Data Protection Regulation (GDPR) and the US Health Insurance Portability and Accountability Act (HIPAA)
In addition, one of the top 10 BI and data trends this year is that regulations are now combining data management, security, privacy, and identity and access management. So, security and governance have become a top priority, especially as you share APIs and data with partners.
Data Governance Framework
The three main components of a data governance framework are people, process, and technology.
PEOPLE: For your governance program, you should consider including the following roles:
Steering Committee: Made up of the Chief Data Officer (and/or the head of IT) and executives from each business unit, this group sets the usage policies and data standards. The committee also defines the mission statement and goals for the program, as well as how its success will be measured.
Governance Team: Led by a data governance manager, this team implements and maintains the systems and tools. It’s typically composed of data architects and other governance specialists from the IT department.
Data stewards: This team manages the datasets and is responsible for the enforcement of rules and day-to-day needs of the business.
PROCESS: You’ll also need formal processes (or activities) to ensure consistent execution and enforcement of the usage policies and data standards set by the steering committee. These processes can be described in flow charts which make clear inputs and tasks for each use case.
TECHNOLOGY: As the name suggests, this component refers to the tools and techniques used to efficiently maintain and manage the security, integrity, lineage, usability, and availability of data. Modern tools can automate most aspects of managing a governance program. For example, a governed data catalog profiles and documents every data source and defines who in an organization can take which actions on which data.
This 2-minute video describes how data engineers, data stewards, and data consumers work with a data catalog as part of a robust data governance process.
Manage Quality and Security in the Modern Data Analytics Pipeline
While you set up the framework described above, keep in mind these three best practices to ensure you’re successful right out of the gate.
Write a glossary
Developing a data glossary (or dictionary) which defines the business terms and concepts you use in your organization will give you consistent business context across multiple tools. For example, everyone should be clear on what qualifies as a “Marketing Qualified Lead” or an “Inactive Customer”.
Map and classify your data
Mapping where your data resides will help you know which system it’s in and how it flows through your organization. Classifying your datasets based on considerations like privacy or sensitivity issues determine how your policies are applied to each dataset.
Establish a data catalog
Building a clear, use case-based data catalog gives you the ability to make different kinds of data available to different kinds of users quickly, without compromising risk. Data catalogs provide information on data lineage, search functions and collaboration tools and give an indexed inventory of available data assets.
The Role of Data Lineage
Data lineage refers to the process of tracking all changes made to data on its journey from source to current location. Data lineage tools help you understand and visualize these changes and data flows so you can know where any specific piece of data came from, how it split and merged with other data, and what transformations have been applied.
So, in a data governance framework, a data steward or data engineer would use a lineage visualization similar to the below example to know they can trust the data and/or trace any errors back to the root cause.
Governance has traditionally focused on the management of finished data such as financial close metrics, regulatory submissions, and key performance indicators. This type of data requires formal definitions and high data quality.
But today’s advanced data science and data analytics often use raw and semi-finished data. And this creates a tension between data providers and data consumers. Providers work hard to provision data responsibly, to everyone, without putting the business at risk. Consumers want data for their projects immediately.
The tiered system shown below offers a solution to this challenge. The funnel addresses different user needs with different types of data, applying increasing scrutiny and quality standards as the data works its way through the system.
This system helps the enterprise governance function focus on a breadth of understanding across the enterprise, including enabling restrictions to sensitive data, as well as a depth of understanding for a smaller number of critical data assets.
DataOps for Analytics
Modern data integration delivers real-time, analytics-ready and actionable data to any analytics environment, from Qlik to Tableau, Power BI and beyond.