How to Create a Data Governance Team? 3 Essential Steps
Data governance is more than just having a strategy – it is about establishing a culture where quality data is achieved, maintained, valued, and used to drive the business.
Modern-day businesses are supported by data and information in many ways and forms. In recent years, data has become the foundation for competition, productivity, growth, and innovation. We are seeing successful organizations shift their focus from producing data to consuming it, and data governance strategies becoming increasingly important to support their crucial business initiatives. Executives and shareholders are starting to realize that data is a strategic asset and data governance is a must if they want to get value from data.
All in all, data governance has one clear goal—to ensure that accurate data and information is available for business objectives. A data governance program delivers plenty of benefits, from legal and regulatory compliance to better risk management and the ability to create new products and services.
A common question every new organization face is how to create data governance. To begin a successful data governance program in your organization, aligning the program with the business strategy and assembling a dedicated data governance team is necessary. Such team and program require significant investment from your organization in terms of time and resources. However, the costs are always offset by the business value it delivers in the long run.
It is also worth remembering that data governance is a continuous, ongoing process. If you want to be successful, your journey must be an evolutionary one. Start now, and build a governance team and culture in your organization to ensure the long-term viability and success of your business
Table of Contents
In the Data Management Body of Knowledge (DMBOK), data governance is defined as the exercise of authority, control, and shared decision-making (planning, control monitoring, and enforcement) over the management of data assets.
While you may have heard about physical assets like cash, buildings, and machines, the 21st century’s digital economy has given us a new type of asset – data. Businesses that like to call themselves “data-driven” need to manage data as an “asset”. Why? Because data is vital for the success of almost all organizations – large and small.
The value of a data asset shows up when it is used correctly in making a decision, and it generates the desired business outcomes. A poor data management strategy can result in ineffectiveness, inconsistencies in data availability, and declining trust in data assets which can negatively impact business objectives; leading to compromised decision accuracy, confused employees, missed opportunities, and reduced customer satisfaction.
Data governance plays a key role in defining and treating these data assets. A correct data governance strategy ensures that:
In addition, data governance defines clear roles and responsibilities regarding how people can manage and make decisions about data. A successful data governance strategy is about ensuring people are properly organized, and they are doing the right things to make their data understandable, trustworthy, of high quality, and, ultimately, suitable and usable for their own purposes.
Recent developments in digital technology and software applications have led to an astronomical amount of digital information; a phenomenon also referred to as data explosion. Prior to the data explosion, most businesses didn’t have that much data, so it was easy to detect anomalies and adjust data accordingly. In effect, data governance was easy.
But in today’s scenario, with increasing amounts of data flooding organizations along with pressures to outperform the competition, and the need to comply with a multitude of regulations, managers are impelled to ask four key questions:
To give you an understanding of the extent of this situation:
Data governance is required due to the growing complexity of data types and steep increases in data-consuming applications. Businesses need to have an effective data governance framework in place to embody data protection principles and meet regulatory and compliance requirements.
According to Saul Judah, VP analyst at Gartner, leaders of data and analytic companies are finding it difficult to identify which aspects of data governance need to improve because they don't have a clear benchmark for best practices. In many companies, data cleanup and governance consume a large amount of data users’ time, which impacts their productivity. McKinsey did a survey and found that such users are spending an average of 30 percent of their total time on non-value-added tasks.
A well-established data governance strategy leads to measurable and tangible business benefits for a “data-driven” organization. Your data governance efforts should be directly connected to your business strategy and priorities, with clear metrics for success. This helps in building more confidence in data assets and better decision-making across your organization.
The ultimate objective of data governance is to enhance trust in data and generate the greatest possible return on data assets.
A holistic approach to data governance involves the creation of a repeatable and scalable data management framework with policies, and standards for the best use of data assets. While data governance is mostly focused on making data accessible, clean, reachable, and secure, data stewardship covers the operational aspect of data governance – the people and processes to manage your information resources.
By creating a governance and stewardship plan, you ensure that your data is well cared for, rather than being turned into a liability. To maintain trusted and secure data, data governance sets business priorities and objectives with clear roles, and responsibilities. Top-down business leadership is usually responsible for this.
The data stewards then put data governance into operation, ensuring that compliance and business objectives are being met and adhered to. In parallel, IT teams and interested stakeholders can work together to ensure that IT has the necessary tools and investments to operationalize and support holistic data stewardship.
There are a number of requirements that need to be met for organizations to be able to successfully implement data governance. These include:
A unified business vocabulary is the first step to ensuring that everyone in your company is speaking the same language. Having a common repository for business terms and their definitions helps eliminate ambiguity.
For example, consider a situation where one business analyst produces a report containing the metric “Total Sales”. In another report from another business analyst, you find a different metric, “Gross Revenue”. When you read them, you wonder – Are these the same? Do Total Sales include tax?
These are some of the questions that may arise when there is an ambiguity in the definition of a business term. Defining clear data definitions for data entities can solve this problem.
Implementing data governance requires profiling and classification of data. Data classification schemes evaluate a data asset, including the content of its different attributes. The profile of data gives an idea about the identification of potential security breaches, their cost, and their likelihood. It is usually the responsibility of knowledgeable individuals like business data stewards to classify data assets since they understand what it means.
Sometimes, data confidentiality and data retention classification processes can be automated by triggering a classification job on the addition of new data sources. Based on the definition of the data class, "purpose," or context of data access or manipulation, you can automatically (or manually) apply policies that control access to and retention of the data.
An essential component that follows data classification and data classes is metadata, which is information about data. Data governance is incomplete without metadata management.
Metadata is like the gateway for data, that summarizes basic information about data and enables data users to easily find their items of interest. Think of metadata as a table of contents in a book. By having a high-level overview of content/data assets – data discovery and exchangeability become extremely easy.
Crucial to metadata management is a data catalog, a tool that helps manage metadata. A data catalog is useful for enterprises that have multiple storage systems and a lot of data, as it connects an inventory of distinct datasets with rich information about who in the organization owns the data.
As your data governance strategy grows, you will want to attach more details to your data catalog, including data class, data quality, sensitivity, etc. When you have the dimensions of information schematized, you can run an advanced search like “Show me all data of type: table and class:X in the “production” environment” and obtain results fast.
Another key element to a successful implementation of a data governance strategy is the people and the processes by which data governance is implemented.
While most organizations tend to focus more on data governance tools than people and processes, understanding of the people using them, and the process set up for their proper use are all critical to governance success as well.
Understands and communicates legal requirements for compliance
Approver (can also be governor)
Implements the company’s governance strategy physically (e.g., data architecture, tooling, data pipelining, etc.)
Is responsible for performing categorization and classification of data
Data analyst/Data scientist
Runs complex data analytics/queries
Runs simple data analyses
Funds company’s governance strategy
Audits a company’s compliance with legal regulations
Defining clear roles and responsibilities avoids confusion and sets a solid foundation upon which a thriving data culture can materialize in your organization.
In addition to defining distinct roles and responsibilities, you also need to define processes that govern how data is defined, discovered, and classified.
Your data governance processes should focus on addressing three key areas: discoverability, security, and accountability.
Discoverability means using data governance to make technical metadata, lineage information, and a business glossary readily available for your business teams. This requires business-critical data to be accurate and complete. A master data management is needed to ensure that the data is properly classified and protected from accidental or malicious changes.
In terms of security, regulatory compliance, management of sensitive data, data security, and prevention of exfiltration are all important for your business. Once discoverability and security are in place, you can start treating the data itself as a product. This is when accountability becomes important as an operating model for ownership and accountability.
Having set the requirements for successful data governance, it’s time to think about how to create data governance team. Structuring and assembling a successful data governance team comprises 3 steps:
The first step on how to create data governance team is the official designation of accountability and responsibility. These are the key factors to the survival of a successful data governance strategy. And this becomes more important for organizations that are figuring out how to create a data governance framework for the first time. Assigning accountability is an essential task, and in many organizations, the responsibility comes in the hands of the steering committee, data governance council, and designated “stewards” or “data custodians”.
Many organizations regard data governance as an entry point for the introduction of new roles. The new positions frequently center on the idea of putting someone in control of data assets, typically above and outside of information technology. A typical data governance org structure in such scenarios would comprise the following roles:
An important consideration when establishing a successful data governance team is technology. Currently, the market for data governance technology is evolving at a rapid pace. Several tools for managing data glossaries, data governance workflow, data discovery, and data quality/governance are being continuously rolled out in the market every other day.
Your data governance team has to ensure that a wise selection from these tools delivers a broad range of integrated capabilities- including data governance, data quality, and analytics and serves your business interests in the best ways possible. The tool should empower every data user to easily define, track, and manage all aspects of their data assets. For the governance dashboard to be useful, it must come with an intuitive dashboard with key KPIs and metrics for gauging progress towards business goals.
That being said, your data governance teams should never feel compelled to buy a data governance tool, just because it “solves” the data accessibility and compliance problem. By definition, a tool exists to improve something you are already doing. If your team hasn’t stepped their feet into the realm of data governance yet, or if they are doing it poorly, then casting about for a tool to help you deploy data governance is a waste of time.
To summarize, here are some key considerations to be aware of while choosing the right technology and tools:
The question “how to create data governance” will have different answers for every organization. Your data governance goals will depend on numerous factors, including your industry, business model, corporate hierarchy, and current data management practices. Also, important will be the type of data your business has.
In order to establish a successful and long-running data governance framework, you need to identify and set pragmatic goals and measure progress toward them. This information can help you and your teams refine and adapt your data governance policies, processes, technology, and team structure needed to achieve your goals. The final framework should be documented and made available to all parties who may interact with the data.
Some of the aspects that your data governance goals might include are:
As the next step, you need metrics to monitor and assess how well your data governance program is moving towards your goals. This will also allow your governance team to identify quick wins and long-term improvements. Some useful metrics to check your data governance success can include:
Lastly, you need to figure out how your data governance program will work. Once you have defined roles, responsibilities, and policies for data quality, security, and compliance, you need to monitor the effectiveness of those policies; prioritize your efforts; and make required decisions.
The challenge of how to create data governance is all about determining how people will work together to make decisions about data. Your data governance strategy will not succeed, if it is not measured, and acted upon. It must be seen as people-to-people interactions, storytelling, knowledge sharing, and innovation rather than a regular, usual bureaucratic activity.
Your data governance program is not a project; it is an ongoing, dynamic, and collaborative effort that requires well-defined cross-functional processes and automated systems. To answer how to create data governance, you must approach data as an asset, but not metaphorically.
When we say data as real business assets, we truly mean it. You may not see “information value” on a balance sheet, but if you and your teams view data asset management in the true business sense, deploying data governance is a lot easier.
The key to a successful data governance program is to set up the right governance teams and tools. In addition, it is imperative to understand and socialize the roles and responsibilities of the data governance participants.
The ability to become a governed organization is built on the successes of proactive organizations, and every company can achieve it. All it requires is a change in mindset from thinking and feeling overwhelmed to taking progressive steps in day-to-day data governance.
You’ve likely heard about ELT — Extract Load and Transform… the Modern Data Stack’s evolution on ETL. This is a game changer by nature in that it enables organizations to ingest raw data into the data warehouse and transform it later. ELT gives end-users access to the entirety of the datasets they need by circumventing downstream issues of missing data that could prevent a specific business question from being answered.
A majority of business leaders believe data insights are key to the success of their business in a digital environment. However, many companies struggle to build a data-driven culture, with a key reason being the lack of a sound data democratization strategy.
Just like data mesh or the metrics layer, active metadata is the latest hot topic in the data world. As with every other new concept that gains popularity in the data stack, there’s been a sudden explosion of vendors rebranding to “active metadata”, ads following you everywhere and… confusion.
As the amount of data rapidly increases, so does the importance of data wrangling and data cleansing. Both processes play a key role in ensuring raw data can be used for operations, analytics, insights, and inform business decisions.
Do you know the current status — quality, reliability, and uptime — of your data and data systems? Not last month or last week, but where they stand at this moment. As businesses grow, being able to confidently answer this question becomes more important. That’s because data needs to be clean, accurate, and up-to-date to be considered reliable for analysis and decision-making. This confidence comes through what’s known as data observability.
In the past years, organizations have been investing heavily to convert themselves into data-driven organizations with the objective to personalize customer experiences, optimize business processes, drive strategic business decisions, etc. As a result, modern data environments are constantly evolving and becoming more and more complex. In general, more data means more business insights that can lead to better decision-making. However, more data also means more complex data infrastructure, which can cause decreased data quality, a higher chance of data breaking, and consequently erosion of data trust within organizations and risk of not being compliant with regulations. The data observability category — which has quickly been developing during the past couple of years — aims to solve these challenges by enabling organizations to trust their data at all times. Although the category is relatively young, there are already a wide variety of players with different offerings and applying various technologies to solve data quality problems.
Data is the most valuable asset for most businesses today. Or at least it has the potential to be. But to realize the full value, organizations must manage their data correctly. This management covers everything from how it’s collected to how it’s maintained and analyzed. And a big component of that is data governance.
I started my career as a first-generation analyst focusing on writing SQL scripts, learning R, and publishing dashboards. As things progressed, I graduated into Data Science and Data Engineering where my focus shifted to managing the life-cycle of ML models and data pipelines. 2022 is my 16th year in the data industry and I am still learning new ways to be productive and impactful. Today, I am now the head of a data science & data engineering function in one of the unicorns and I would like to share my findings and where I am heading next.