Data governance involves several concepts that describe the characteristics and setting in which the data is found. For people in leadership positions involving data, it is critical to have some understanding of the following concepts related to data governance. These concepts are
Each of these concepts plays a role in shaping the role of data within an organization.
Data ownership is not always as obvious as it seems. One company may be using the data of a different company. It is important to identify who the data belongs to so that any rules and restrictions that the owner has about the use of the data are something that the user of the data is aware of.
Addressing details related to ownership helps to determine accountability as well. Identifying ownership can also identify who is responsible for the data because the owners will hopefully have an idea of who should be using the data. If not this is something that needs to be clarified as well.
Data quality is another self-explanatory term. Data quality is a way of determining how good the data is based on some criteria. One commonly used criterion for data quality is to determine the data’s completeness, consistency, timeliness, accuracy, and integrity.
Completeness is determining if everything that the data is supposed to capture is represented in the data set. For example, if income is one variable that needs to be in a dataset it is important to check that it is there.
Consistency is that the data that you are looking at is similar to other data in the same context. For example, student record data is probably similar regardless of the institutions. Therefore, someone with experience with student record data can tell you if the data you are looking at is consistent with other data in a similar context.
Timeliness has to do with the recency of the data. Some data is real-time while other data is historical. Therefore, the timeliness of the data will depend on the context of the project. A chatbot needs recent data while a study of incomes from ten years ago does not need data from yesterday.
Accuracy and integrity are two more measures of qualityu. Accuracy is how well the data represents the population. For example, a population of male college students should have data about male college students. Integrity has to do with the truthfulness of the data. For example, if the data was manipulated this needs to be explained.
Data protection has to do with all of the basic security concerns IT departments have to deal with today. Some examples include encryption and password protection. In addition, there may be a need to be aware of privacy concerns such as financial records or data collected from children.
There should also be awareness of disaster recovery. For example, there might be a real disaster that wipes out data or it can be an accidental deletion by someone. In either case, there should be backup copies of the data. Lastly, protection also involves controlling who has access to the data.
Despite the concerns of protection, data still needs to be available to the appropriate parties and this relates to data availability. Whoever is supposed to have the data should be able to access it as needed.
The data must also be usable. The level of usability will depend on the user. For example, a data analyst should be able to handle messy data but a consumer of dashboards needs the data to be clean and ready prior to use.
Data management is the implementation of the policies that are developed in the previous ideas mentioned. The data leadership team needs to develop processes and policies for ownership, quality, protection, and availability of data.
Once the policies are developed they have to actually be employed within the institution which can always be difficult as people generally want to avoid accountability and or responsibility, especially when things go wrong. In addition, change is always disliked as people gravitate towards the current norms.
Data governance is a critical part of institutions today given the importance of data now. IT departments need to develop policies and plans on the data in order to maintain trust in whatever conclusions are made from data.