Tag Archives: data governance

similar cubes with rules inscription on windowsill in building

Types of Data Quality Rules

Data quality rules are for protecting data from errors. In this post, we will learn about different data quality rules. In addition, we will look at tools used in connection with data quality rules.


Detective rules monitor data after it has already moved through a pipeline and is being used by the organization. Detective rules are generally used when the issues that are being detected are not causing a major problem when the issue cannot be solved quickly, and when a limited number of records are affected.

Of course, all of the criteria listed above are relative. In other words, it is up to the organization to determine what thresholds are needed for a data quality rule to be considered a detective rule.


An example of a detective data quality rule may be a student information table that is missing a student’s uniform size. Such information is useful but probably not worthy enough to stop the data from moving to others for use.


Preventive data quality rules stop data in the pipeline when issues are found. Preventive rules are used when the data is too important to allow errors, when the problem is easy to fix, and or when the issue is affecting a large number of records. Again, all of these criteria are relative to the organization.

An example of a violation of a data quality prevention rule would be a student records table missing student ID numbers. Generally, such information is needed to identify students and make joins between tables. Therefore, such a problem would need to be fixed immediately.

Thresholds & Anomaly detection

There are several tools for implementing detection and prevention data quality rules. Among the choices are the setting of thresholds and the use of anomaly detection.

Thresholds are actions that are triggered after a certain number of errors occurred. It is totally up to the organization to determine how to set up their thresholds. Common levels include no action, warning, alert, and prevention. Each level must have a minimum number of errors that must occur for this information to be passed on to the user or IT.

To make things more complicated you can tie threshold levels to detective and preventive rules. For example, if a dataset has 5% missing data it might only flag it as a warning threshold. However, if the missing data jumps to 10% it might now be a violation of a preventative rule as the violation has reached the prevention level.

Anomaly detection can be used to find outliers. Unusual records can be flagged for review. For example, a university has an active student who was born in 1920. Such a birthdate is highly unusual and the system should flag it as an outlier by the rule. After reviewing, IT can decide if it is necessary to edit the record. Again, anomaly detection can be used to detect or prevent data errors and can have thresholds set to them as well.


Data quality rules can be developed to monitor the state of data within a system. Once the rules are developed it is important to determine if they are detective or preventative. The main reason for this is that the type of rule affects the urgency with which the problem needs to be addressed.

person in white long sleeve shirt holding credit card

Data Profile

One aspect of the data governance experience is data profiling. In this post we will look at what a data profile is, an example of a simple data profile, and the development of rules that are related to the data profile.


Data profiling is the process of running descriptive statistics on a dataset to develop insights about the data and field dependencies. Some questions there are commonly asked when performing a data profile includes.

  • How many observations are in the data set?
  •  What are the min and max values of a column(s)?
  •  How many observations have a particular column populated with a value (missing vs non-missing data)?
  •  When one column is populated what other columns are populated?

Data profiling helps you to confirm what you know and do not know about your data. This knowledge will help you to determine issues with your data quality and to develop rules to assess data quality.

Student Records Table


The first column from the left is the student id. Looking at this column we can see that there are five records with data. That this column is numeric with 4 characters. The minimum value is 1001 and the max value is 1005.

The next two columns are first name and last name. Both of these columns are string text with a min character length of 5 and a max length of 7 for first name and 5 for last name. For both columns, 80% of the records are populated with a value. In addition, 60% of the records have a first name and a last name.


The fourth column is the birthdate. This column has populated records 80% of the time and all rows follow a MM/DD/YYYY format. The minimum value is 04/04/2000 and the max value is 01/01/2005. 40% of the rows have a first name, last name, and birthdate.

Lastly, 100% of the class-level column is populated with values. 20% of the values are senior, 40% are junior, 20% are sophomore, and 20% are freshman.

Developing Data Quality Rules

From the insights derived from the data profile, we can now develop some rules to ensure quality. With any analysis or insight the actual rules will vary from place to place based on needs and context but below are some examples for demonstration purposes.

  • All StudentID values must be 4 numeric characters
  •  The Student ID values must be populated
  •  All StudentFirstName values must be 1-10 characters in length
  •  All StudentLastName values must be 1-10 characters in length
  •  All StudentBirhdate values must be in MM/DD/YYYY format
  •  All StudentClassLevel values must be Freshman, Sophomore,, Junior, or Senior


A data profile can be much more in-depth than the example presented here. However, if you have hundreds of tables and dozens of databases this can be quite a labor-intensive experience. There is software available to help with this but a discussion of that will have to wait for the future.

rows of different lenses for checking eyesight

Data Quality

Bad data leads to bad decisions. However, the question is how can you know if your data is bad. One answer to this question is the use of data quality metrics. In this post, we will look at a definition of data quality as well as metrics of data quality


Data quality is a measure of the degree that data is appropriate for its intended purpose. In other words, it is the context in which the data is used that determines if it is of high quality. For example, knowing email addresses may be appropriate in one instance but inappropriate in another instance.


When data is determined to be of high quality it helps to encourage trust in the data. Developing this trust is critical for decision-makers to have confidence in the actions they choose to take based on the data that they have. Therefore data quality is of critical importance for an organization and below are several measures of data quality.

Measuring Data Quality

Completeness is a measure of the degree to which expected columns (variables) and rows (observations) are present. There are times when data can be incomplete due to missing data and or missing variables. There can also be data that is partially completed which means that data is present in some columns but not others. There are various tools for finding this type of missing data in whatever language you are using.

Validity is a measure of how appropriate the data is in comparison to what the data is supposed to represent. For example, if there is a column in a dataset that measures the class level of high school students using Freshman, Sophmore, Junior, and Senior. Data would e invalid if it use the numerical values for the grade levels such as 9, 10, 11, and 12. This is only invalid because of the context and the assumptions that are brought to the data quality test.

Uniqueness is a measure of duplicate values. Normally, duplicate values happen along rows in structured data which indicates that the same observation appears twice or more. However, it is possible to have duplicate columns or variables in a dataset. Having duplicate variables can cause confusion and erroneous conclusions in statistical models such as regression.

Consistency is a measure of whether data is the same across all instances. For example, there are times when a dataset is refreshed overnight or whenever. The expectation is that the data should be mostly the same except for the new values. A consistency check would assess this. There are also times when thresholds are put in place such that the data can be a little different based on the parameters that are set.

Timeliness is the availability of the data. For example, if data is supposed to be ready by midnight any data that comes after this time fails the timeliness criteria. Data has to be ready when it is supposed to be. This is critical for real-time applications in which people or applications are waiting for data.

Accuracy is the correctness of the data. The main challenge of this is that there is an assumption that the ground truth is known to make the comparison. If a ground truth is available the data is compared to the truth to determine the accuracy.


The metrics shared here are for helping the analyst to determine the quality of their data. For each of these metrics, there are practical ways to assess them using a variety of tools. With this knowledge, you can be sure of the quality of your data.

man showing distress

Data Governance Solutions

Data governance is good at indicating various problems an organization may have with data. However, finding problems doesn’t help as much as finding solutions does. This post will look at several different data governance solutions that deal with different problems.

Business Glossary

The business glossary contains standard descriptions and definitions. It also can contain business terms or discipline-specific terminology. One of the main benefits of developing a business glossary is creating a common vocabulary within the organization.

Many if not all businesses and fields of study have several different terms that mean the same thing. In addition, people can be careless with terminology, to the confusion of outsiders. Lastly, sometimes a local organization will have its own unique terminology. No matter the case the business dictionary helps everyone within an organization to communicate with one another.


An example of a term in a business dictionary might be how a school defines a student ID number. The dictionary explains what the student ID number is and provides uses of the ID number within the school.

Data Dictionary

The data dictionary provides technical information. Some of the information in the data dictionary can include the location of data, relationships between tables, values, and usage of data. One benefit of the data dictionary is that it promotes consistency and transparency concerning data.

Returning to our student ID number example, a data dictionary would share where the student ID number is stored and the characteristics of this column such as the ID number being 7 digits. For a categorical variable, the data dictionary may explain what values are contained within the variable such as male and female for gender.

Data Catalog

A data catalog is a tool for metadata management. It provides an organized inventory of data within the organization. Benefits of a data catalog include improving efficiency and transparency, quick locating of data, collaboration, and data sharing.

An example of a data catalog would be a document that contains the metadata about several different data warehouses or sources within an organization. If a data analyst is trying to figure out where data on student ID numbers are stored they may start with the data catalog to determine where this data is. The data dictionary will explain the characteristics of the student ID column. Sometimes the data dictionary and catalog can be one document if tracking the data in an organization is not too complicated. The point is that the distinction between these solutions is not obvious and is really up to the organization.

Automated Data Lineage

Data lineage describes how data moves within an organization from production to transformation and finally to loading. Tracking this process is really complicated and time-consuming and many organizations have turned to software to complete this.

The primary benefit of tracking data lineage is increasing the trust and accuracy of the data. If there are any problems in the pipeline, data lineage can help to determine where the errors are creeping into the pipeline.

Data Protection, Privacy, QUailty 

Data protection is about securing the data so that it is not tampered with in an unauthorized manner. An example of data protection would be implementing access capabilities such as user roles and passwords.

Data privacy is related to protection and involves making sure that information is restricted to authorized personnel. Thus, this also requires the use of logins and passwords. In addition, classifying the privacy level of data can also help in protecting it. For example, salaries are generally highly confidential while employee work phone numbers are probably not.

Data quality involves checking the health of the accuracy and consistency of the data. Tools for completing this task can include creating KPIs and metrics to measure data quality, developing policies and standards that defined what is good data quality as determined by the organization, and developing reports that share the current quality of data.


The purpose of data governance is to support an organization in maintaining data that is an asset to the organization. In order for data to be an asset it must be maintained so that the insights and decisions that are made from the data are as accurate and clear as possible. The tools described in this post provide some of the ways in which data can be protected within an organization.

my secret plan to rule the world book

Data Governance Strategy

A strategy is a plan of action. Within data governance, it makes sense to ultimately develop a strategy or plan to ensure data governance takes place. In this post, we will look at the components of a data governance strategy. Below are the common components of a data governance strategy.

  • Approach
  •  Vision statement
  •  Mission statement
  •  Value proposition
  •  Guiding principles
  •  Roles & Responsibilities

There is probably no particular order in which these components are completed. However, they tend to follow an inverted pyramid in terms of the scope of what they deal with. In other words, the approach is perhaps the broadest component and affects everything below it followed by the vision statement, etc. Where to begin probably depends on how your mind works. A detail-oriented person may start at the bottom while a big-picture thinker would start at the top.

Defined Approach

The approach defines how the organization will go about data governance. There are two extremes for this and they are defensive and offensive. A defensive approach is focused on risk mitigation while an offensive approach is focused more on achieving organizational goals.


Neither approach is superior to the other and the situation an organization is in will shape which is appropriate. For example, an organization that is struggling with data breaches may choose a more defensive approach while an organization that is thriving with allegations may take a more offensive approach.

Vision Statement

A vision statement is a brief snapshot of where the organization wants to be. Another way to see this is that a vision statement is the purpose of the organization. The vision statement needs to be inspiring and easily understood. It also helps to align the policies and standards that are developed.

An example of a vision statement for data governance is found below.

Transforming how data is leveraged to make informed decisions to support youth served by this organization

The vision is to transform data for decision-making. This is an ongoing process that will continue indefinitely.

Mission Statement

The mission statement explains how an organization will strive toward its vision. Like a vision statement, the mission statement provides guidance in developing policies and standards. The mission statement should be a call to action and include some of the goals the organization has about data. Below is an example

Enabling stakeholders to make data-driven decisions by providing accurate, timely data and insights

In the example above, it is clear that accuracy, timeliness, and insights are the goals for achieving the vision statement. In addition, the audience is identified which is the stakeholders within the organization.

Value Proposition

The value proposition provides a justification or the significance of adopting a data governance strategy. Another way to look at this is an emphasis on persuasion. Some of the ideas included in the value proposition are the benefits of implementation. Often the value proposition is written in the form of cause and effect statement(s). Below is an example

By implementing this data governance program we will see the following benefits: 

Improved data quality for actionable insights, increased trust in data for making decisions, and clarity of roles and responsibilities of analysts

In the example above three clear benefits are shared. Succinctly this provides people with the potential outcomes of adopting this strategy. Naturally, it would be beneficial to develop ways to measure these ideas which means that only benefits that can be measured should be a part of the value proposition.

Guiding Principles

Guiding principles define how data should be used and managed. Common principles include transparency, accountability, integrity, and collaboration. These principles are just more concrete information for shaping policies and standards. Below is an example of a guiding principle.

All data will have people assigned to play critical roles in it

The guiding principle above is focused on accountability. Making sure all data has people who are assigned to perform various responsibilities concerning it is important to define and explain.

Roles & Responsibilities

Roles and responsibilities are about explaining the function of the data governance team and the role each person will play. For example, a small organization might have people who adopt more than one role such as being data stewards and custodians while larger organizations might separate these roles.

In addition, it is also important to determine the operating model and whether it will be centralized or decentralized. Determining the operating model again depends on the context and preferences of the organization.

It is also critical to determine how compliance with the policies and standards will be measured. It is not enough to say it, eventually, there needs to be evidence in terms of progress and potential changes that need to be made to the strategy. For example, perhaps a data audit is done monthly or quarterly to assess data quality.


Having a data governance strategy is a crucial step in improving data governance within an organization. Once a plan is in place it is simply a matter of implementation to see if it works.

white dry erase board with red diagram

Data Governance Assessment

Before data governance can begin at an organization it is critical to assess where the organization is currently in terms of data governance. This necessitates the need for a data governance assessment. The assessment helps an organization to figure out where to begin by identifying challenges and prioritizing what needs to be addressed. In particular, it is common for there to be five steps in this process as shown below.

  1. Identify data sources and stakeholders
  2.  Interview stakeholders
  3.  Determine current capabilities
  4.  Document the current state and target state
  5.  Analyze gaps and prioritize

We will look at each of these steps below.

Identify Data Sources and Stakeholders

Step one involves determining what data is used within the organization and the users or stakeholders of this data. Essentially, you are trying to determine…

  • What data is out there?
  •  Who uses it?
  •  Who produces it?
  •  Who protects it?
  •  Who is responsible for it?

Answering these questions also provides insights into what roles in relation to data governance are already being fulfilled at least implicitly and which roles need to be added to the organization. At most organizations at least some of these questions have answers and there are people responsible for many roles. The purpose here is not only to get this information but also to make people aware of the roles they are fulfilling from a data governance perspective.


Interview Stakeholders

Step two involves interviewing stakeholders. Once it is clear who is associated with data in the organization it is time to reach out to these people. You want to develop questions to ask stakeholders in order to inform you about what issues to address in relation to data governance.

An easy way to do this is to develop questions that address the pillars of data governance. The pillars are…

  • Ownership & accountability
  •  Data quality
  •  Data protection and privacy
  •  Data management
  •  Data use

Below are some sample questions based on the pillars above.

  • How do you know your data is of high quality
  •  What needs to be done to improve data quality
  •  How is data protected from misuse and loss
  •  How is metadata handle
  •  What concerns do you have related to data
  •  What policies are there now related to data
  •  What roles are there in relation to data
  •  How is data used here

It may be necessary to address all or some of these pillars when conducting the assessment. The benefit of these pillars is they provide a starting point in which you can shape your own interview questions. In terms of the interview, it is up to each organization to determine what is best for data collection. Maybe a survey works or perhaps semi-structured interviews or focus groups. The actual research part of this process is beyond the scope of this interview.

Determine Current Capabilities

Step three involves determining the current capabilities of the organization in terms of data governance. Often this can be done by looking at the stakeholder interviews and comparing what they said to a rating scale. For example, the DCAM rating scale has six levels of data governance competence as shown below.

  1. Non-initiated-No governance happening
  2.  Conceptual-Aware of data governance and planning
  3.  Developmental-Engaged in developing a plan
  4.  Defined-PLan approved
  5.  Achieved-Plann implemented and enforced
  6.  Enhanced-Plan a part of the culture and updated regularly

Determining the current capabilities is a subjective process. However, it needs to be done in order to determine the next steps in bringing data governance along in an organization.

Document Current State and Target State

Step four involves determining the current state and determining what the target state is. Again, this will be based on what was learned in the stakeholder interviews. What you will do is report what the stakeholders said in the interviews based on the pillars of data governance. It is not necessary to use the pillars but it does provide a convenient way to organize the data without having to develop your own way of classifying the results.

Once the current state is defined it is now time to determine what the organization should be striving for in the future and this is called the target state. The target state is the direction the organization is heading within a given timeframe. It is up to the data governance team to determine this and how it is done will vary. The main point is to make sure not to try and address too many issues at once and save some for the next cycle.

Analyze and Prioritize

The final step is to analyze and prioritize. This step involves performing a gap analysis to determine solutions that will solve the issues found in the previous step. In addition, it is also important to prioritize which gaps to address first.

Another part of this step is sharing recommendations and soliciting feedback. Provide insights into which direction the organization can go to improve its data governance and allow stakeholders to provide feedback in terms of their agreement with the report. Once all this is done the report is completed and documented until the next time this process needs to take place.


The steps presented here are not prescriptive. They are shared as a starting point for an organization’s journey in improving data governance. With experience, each organization will find its own way to support its stakeholders in the management of data.

two gray bullet security cameras

Data Governance Office

The data governance office or team are the leaders in dealing with data within an organization. This team is comprised of several members such as

  • Chief Data Officer
  •  Data Governance Lead
  •  Data Governance Consultant
  •  Data Quality Analyst

We will look at each of these below. It also needs to be mentioned that a person might be assigned several of these roles which are particularly true in a smaller organization. In addition, it is possible that several people might fulfill one of these roles in a much larger organization as well.

Chief Data Officer

The chief data officer is responsible for shaping the overall data strategy at an organization. The chief data officer also promotes a data-driven culture and pushes for change within the organization. A person in this position also needs to understand the data needs of the organization in order to further the vision of the institution or company.


The role of the chief data officer encompasses all of the other roles that will be discussed. The chief data officer is essentially the leader of the data team and provides help with governance consulting, quality, and analytics. However, the primary role of this position is to see the big picture for big data and to guide the organization in this regard, which implies that technical skills are beneficial but leadership and change promotion is more critical. In sum, this is a challenging position that requires a large amount of experience

Data Governance Lead

The data governance leads primary responsibilities to involve defining policies and data governance frameworks. While the chief data officer is more of an evangelist or promoter of data governance the data governance lead is focused on the actual implementation of change and guiding the organization in this process.

Essentially, the data governance lead is in charge of the day-to-day operation of the data governance team. While the chief data officer may be the dreamer the data governance lead is a steady hand behind the push for change.

Data Governance Consultant

The data governance consultant is the subject matter expert in data governance. Their role is to know all the details of data governance in the general field and even better if they know how to make data governance happen in a particular discipline. For example, a data governance consultant who knows how to make data governance happen within the context of a university in particular.

The data governance consultant supports the data governance lead with implementation. In addition, the consultant is a go-between for the larger organization and IT. Serving as a go-between implies that the consultant is able to effectively communicate with both parties on a technical level with IT and in a layman’s matter with the larger organization. The synergy between IT and the larger organization can be challenging because of communications issues due to vastly different backgrounds and it is the consultant’s responsibility to bridge this gap.

Data Quality Analyst

The data quality analyst’s job is as the name implies to ensure quality data. One way of determining data quality is to develop rules for data entry. For example, a rule for data quality is that marital status can only be single, married, divorced, or widowed. This rule restricts any other option that people may want. When this rule is supported it is an example of high quilty within this context.

A data quality analyst also performs troubleshooting or root cause investigations. If something funny is going on in the data such as duplicates, it is the data quality analyst’s job to determine what is causing the problems and to find a solution. Lastly, a data quality analyst is also responsible for statistical work. This can include statistical work that is associated with the work of a data analyst and or statistical work that monitors the use of data and the quality of data within the organization.


The data governance team plays a critical role in supporting the organization with reliable and clean data that can be trusted to make actionable insights. Even though this is a tremendous challenge it is an important function in an organization.

person holding white and black frame

Data Governance Framework Types and Principles

When it is time to develop data governance policies the first thing to consider is how the team views data governance. In this post, we will look at various data governance frameworks and principles to keep in mind when employing a data governance framework.


The top-down framework involves a small group of data providers. These data providers serve as gatekeepers for data that is used in the institution. Whatever data is used is controlled centrally in this framework.


One obvious benefit of this approach is that with a small group of people in charge, decision-making should be fast and relatively efficient. In addition, if something does go wrong it should be easy to trace the source of the problem. However, a top-down approach only works in situations that have small amounts of data or end users. When the amount of data becomes too large the small team will struggle to support users which indicates that this approach is hard to scale. Lastly, people may resent having to abide by rules that are handed down from above.


The bottom-up approach to data governance is the mirror opposite of the top-down approach. Where top-down involves a handful of decision-makers bottom-up focus is on a democratic style of data leadership. Bottom-up is scaleable due to everyone being involved in the process while top-down does not scale well. Generally, controls and restrictions on data are put in place after the raw data is shared rather than before when the bottom-up approach is used.

Like all approaches to data governance, there are concerns with the bottom-up approach. For example, it becomes harder to control the data when people are allowed to use raw data that has not been prepared for use. In addition, because of the democratic nature of the bottom-up approach, there is also an increased risk of security concerns because of the increased freedom people have.


The collaborative approach is a mix of top-down and bottom-up ideas on data governance. This approach is flexible and balanced while placing an emphasis on collaboration. The collaboration can be among stakeholders or between the gatekeepers and the users of data.

One main concern with this approach is that it can become messy and difficult to execute if principles and goals are not clearly defined. There it is important to spend a large amount of time in planning when choosing this approach.


Regardless of which framework you pick when beginning data governance. There are also several terms you need to be familiar with to help you be successful. For example, integrity involves maintaining open lines of communication and the sharing of problems so that an atmosphere of trust is maintained or developed.

It is also important to determine ownership for the purpose of governance and decision-making. Determining ownership also helps to find gaps in accountability and responsibility for data.

Leaders in data governance must also be aware of change and risk management. Change management is tools and process for communicating new strategies and policies related to data governance. Change management helps with ensuring a smooth transition from one state of equilibrium to another. Risk management is tools related to auditing and developing interventions for non-compliance.

A final concept to be aware of is strategic alignment. The goals and purpose of data governance must align with the goals of the organization that data governance is supporting. For example, a school will have a strict stance on protecting student privacy. Therefore, data governance needs to reflect this and support strict privacy policies


Frameworks provide a foundation on which your team can shape their policies for data governance. Each framework has its strengths and weaknesses but the point is to be aware of the basic ways that you can at least begin the process of forming policies and strategies for governing data at an organization.

white paper with note

Data Governance Framework

In this post we will look at a defining data governance framework. We will also look a the key components that are a part of a data governance framework.


A data governance framework is the how or the plan for governing the data within an organization. The term data governance determines what needs to be governed or controlled while the data governance framework is the actual plan for controlling the data.

Common Components

There are several common components of a data governance plan and they include the following.

  • Strategy
  •  Policies
  •  Processes
  •  Coordination
  •  Monitoring/communication
  •  Data literacy/culture

Strategy involves determining how data can be used to solve problems. This may seem pointless but certain data can be used to solve certain problems. For example, customers’ addresses in California might not be appropriate for determining revenue generated in Texas. When data is looked at strategically it helps to ensure that it is viewed as an asset in many cases by those who use it.


Policies help to guide such things as decision-making and expectations concerning data. In addition, policies also help with determining responsibilities and tasks related to data management. One example of policy in action is the development of standards which are rules for best practices in order to meet a policy. A policy may be something like protecting privacy. A standard to meet this policy would be to ensure that data is encrypted and password protected.

Process and technology involve steps for monitoring the quality of data. Other topics related to process can include dealing with metadata and data management. The proper process mainly helps with efficiency in the organization.

Coordination involves the processes of working together. Coordination can involve defining the roles and responsibilities for a complex process that requires collaboration with data. In other words, coordination is developed when multiple parties are involved with a complex task.

Progress monitoring involves the development of KPIs to make sure that the performance expectations are measured and adhered to. Progress monitoring can also involve issues related to privacy, quality, and compliance. An example of progress monitoring may be requiring everyone to change their password every 90 days. At the end of the 90 days, the system will automatically make the user create a new password.

Lastly, data literacy and culture involve training and developing the skill of analyzing and or communicating data to people and others within the organization of use or consumption data. Naturally, this is an ongoing process and how it works depends on who is involved.


A framework is a plan for achieving a particular goal or vision. As organizations work with data, they must be diligent in making sure that the data that is used is trustworthy and protected. A data governance framework is one way in which these goals can be attained.

a man in maroon suit sitting at the table

Influences and Approaches of Data Governance

Data governance has been around for a while. As a result of this, there have been various trends and challenges that have influenced this field. in this post, we will look at several laws that have had an impact on data governance along with various concepts that have been developed to address common concerns.


Several laws have played a critical role in influencing data governance both in the USA and internationally. For example, the Sarbanes-Oxley (SOX) Act was enacted in 2002. The SOX act was created in reaction to various accounting scandals at the time and large corporations. Among some of the requirements of this law are setting standards for financial and corporate reporting and the need for executives to verify or attest that the financial information is correct. Naturally, this requires data governance to make sure that the data is appropriate so that these requirements can be met.


There are also several laws related to privacy in particular. Focusing again on the USA there is the Health Insurance Portability and Accountability (HIPAA) which requires institutions in the medical field to protect patient data. For leaders in data, they must develop data governance policies that protect medical information.

In the state of California, there is the California Consumers Protection Act (CCPA) which allows California residents more control over how their personal data is handled by companies. The CCPA is focused much more on the collection and selling of personal data as this has become a lucrative industry in the data world.

At the international level, there is the General Data Protection Regulation (GDPR). The GDPR is a privacy law that applies to anybody who lives in the EU. What this implies is that a company in another part of the world that has customers in the EU must abide by this law as well. As such, this is one example of a local law related to data governance that can have a global impact.

Various Concepts that Support Data Governance

Data governance was around much earlier than the laws described above. However, several different concepts and strategies were developed to address transparency and privacy as explained below.

Data classification and retention deals with the level of confidentiality of the data and policies for data destruction. For example, social security numbers is a form of data that is highly confidential while the types of shoes a store sells would probably not be considered private. In addition, some data is not meant to be kept forever. For example, consumers may request their information be removed from a website such as credit card numbers. In such a situation there must be a way for this data to be removed permanently from the system.

Data management is focused on consistency and transparency. There must be a master copy of data to serve as a backup and for checking the accuracy of other copies. In addition, there must be some form of data reference management to identify and map datasets through some general identification such as zip code or state.

Lastly, metadata management deals with data that describes the data. By providing this information it is possible to search and catalog data


Data governance will continue to be influenced by the laws and context of the world. With new challenges will be new ways to satisfy the concerns of both lawmakers and the general public.

white caution cone on keyboard

Data Governance

Data governance involves several concepts that describe the characteristics and setting in which the data is found. For people in leadership positions involving data, it is critical to have some understanding of the following concepts related to data governance. These concepts are

  • Ownership
  •  Quality
  •  Protection
  •  Use/Availability
  •  Management

Each of these concepts plays a role in shaping the role of data within an organization.


Data ownership is not always as obvious as it seems. One company may be using the data of a different company. It is important to identify who the data belongs to so that any rules and restrictions that the owner has about the use of the data are something that the user of the data is aware of.


Addressing details related to ownership helps to determine accountability as well. Identifying ownership can also identify who is responsible for the data because the owners will hopefully have an idea of who should be using the data. If not this is something that needs to be clarified as well.


Data quality is another self-explanatory term. Data quality is a way of determining how good the data is based on some criteria. One commonly used criterion for data quality is to determine the data’s completeness, consistency, timeliness, accuracy, and integrity.

Completeness is determining if everything that the data is supposed to capture is represented in the data set. For example, if income is one variable that needs to be in a dataset it is important to check that it is there.

Consistency is that the data that you are looking at is similar to other data in the same context. For example, student record data is probably similar regardless of the institutions. Therefore, someone with experience with student record data can tell you if the data you are looking at is consistent with other data in a similar context.

Timeliness has to do with the recency of the data. Some data is real-time while other data is historical. Therefore, the timeliness of the data will depend on the context of the project. A chatbot needs recent data while a study of incomes from ten years ago does not need data from yesterday.

Accuracy and integrity are two more measures of qualityu. Accuracy is how well the data represents the population. For example, a population of male college students should have data about male college students. Integrity has to do with the truthfulness of the data. For example, if the data was manipulated this needs to be explained.


Data protection has to do with all of the basic security concerns IT departments have to deal with today. Some examples include encryption and password protection. In addition, there may be a need to be aware of privacy concerns such as financial records or data collected from children.

There should also be awareness of disaster recovery. For example, there might be a real disaster that wipes out data or it can be an accidental deletion by someone. In either case, there should be backup copies of the data. Lastly, protection also involves controlling who has access to the data.


Despite the concerns of protection, data still needs to be available to the appropriate parties and this relates to data availability. Whoever is supposed to have the data should be able to access it as needed.

The data must also be usable. The level of usability will depend on the user. For example, a data analyst should be able to handle messy data but a consumer of dashboards needs the data to be clean and ready prior to use.


Data management is the implementation of the policies that are developed in the previous ideas mentioned. The data leadership team needs to develop processes and policies for ownership, quality, protection, and availability of data.

Once the policies are developed they have to actually be employed within the institution which can always be difficult as people generally want to avoid accountability and or responsibility, especially when things go wrong. In addition, change is always disliked as people gravitate towards the current norms.


Data governance is a critical part of institutions today given the importance of data now. IT departments need to develop policies and plans on the data in order to maintain trust in whatever conclusions are made from data.