Data governance is good at indicating various problems an organization may have with data. However, finding problems doesn’t help as much as finding solutions does. This post will look at several different data governance solutions that deal with different problems.
The business glossary contains standard descriptions and definitions. It also can contain business terms or discipline-specific terminology. One of the main benefits of developing a business glossary is creating a common vocabulary within the organization.
Many if not all businesses and fields of study have several different terms that mean the same thing. In addition, people can be careless with terminology, to the confusion of outsiders. Lastly, sometimes a local organization will have its own unique terminology. No matter the case the business dictionary helps everyone within an organization to communicate with one another.
An example of a term in a business dictionary might be how a school defines a student ID number. The dictionary explains what the student ID number is and provides uses of the ID number within the school.
The data dictionary provides technical information. Some of the information in the data dictionary can include the location of data, relationships between tables, values, and usage of data. One benefit of the data dictionary is that it promotes consistency and transparency concerning data.
Returning to our student ID number example, a data dictionary would share where the student ID number is stored and the characteristics of this column such as the ID number being 7 digits. For a categorical variable, the data dictionary may explain what values are contained within the variable such as male and female for gender.
A data catalog is a tool for metadata management. It provides an organized inventory of data within the organization. Benefits of a data catalog include improving efficiency and transparency, quick locating of data, collaboration, and data sharing.
An example of a data catalog would be a document that contains the metadata about several different data warehouses or sources within an organization. If a data analyst is trying to figure out where data on student ID numbers are stored they may start with the data catalog to determine where this data is. The data dictionary will explain the characteristics of the student ID column. Sometimes the data dictionary and catalog can be one document if tracking the data in an organization is not too complicated. The point is that the distinction between these solutions is not obvious and is really up to the organization.
Automated Data Lineage
Data lineage describes how data moves within an organization from production to transformation and finally to loading. Tracking this process is really complicated and time-consuming and many organizations have turned to software to complete this.
The primary benefit of tracking data lineage is increasing the trust and accuracy of the data. If there are any problems in the pipeline, data lineage can help to determine where the errors are creeping into the pipeline.
Data Protection, Privacy, QUailty
Data protection is about securing the data so that it is not tampered with in an unauthorized manner. An example of data protection would be implementing access capabilities such as user roles and passwords.
Data privacy is related to protection and involves making sure that information is restricted to authorized personnel. Thus, this also requires the use of logins and passwords. In addition, classifying the privacy level of data can also help in protecting it. For example, salaries are generally highly confidential while employee work phone numbers are probably not.
Data quality involves checking the health of the accuracy and consistency of the data. Tools for completing this task can include creating KPIs and metrics to measure data quality, developing policies and standards that defined what is good data quality as determined by the organization, and developing reports that share the current quality of data.
The purpose of data governance is to support an organization in maintaining data that is an asset to the organization. In order for data to be an asset it must be maintained so that the insights and decisions that are made from the data are as accurate and clear as possible. The tools described in this post provide some of the ways in which data can be protected within an organization.