Monthly Archives: August 2023

my secret plan to rule the world book

Data Governance Strategy

A strategy is a plan of action. Within data governance, it makes sense to ultimately develop a strategy or plan to ensure data governance takes place. In this post, we will look at the components of a data governance strategy. Below are the common components of a data governance strategy.

  • Approach
  •  Vision statement
  •  Mission statement
  •  Value proposition
  •  Guiding principles
  •  Roles & Responsibilities

There is probably no particular order in which these components are completed. However, they tend to follow an inverted pyramid in terms of the scope of what they deal with. In other words, the approach is perhaps the broadest component and affects everything below it followed by the vision statement, etc. Where to begin probably depends on how your mind works. A detail-oriented person may start at the bottom while a big-picture thinker would start at the top.

Defined Approach

The approach defines how the organization will go about data governance. There are two extremes for this and they are defensive and offensive. A defensive approach is focused on risk mitigation while an offensive approach is focused more on achieving organizational goals.

ad

Neither approach is superior to the other and the situation an organization is in will shape which is appropriate. For example, an organization that is struggling with data breaches may choose a more defensive approach while an organization that is thriving with allegations may take a more offensive approach.

Vision Statement

A vision statement is a brief snapshot of where the organization wants to be. Another way to see this is that a vision statement is the purpose of the organization. The vision statement needs to be inspiring and easily understood. It also helps to align the policies and standards that are developed.

An example of a vision statement for data governance is found below.

Transforming how data is leveraged to make informed decisions to support youth served by this organization

The vision is to transform data for decision-making. This is an ongoing process that will continue indefinitely.

Mission Statement

The mission statement explains how an organization will strive toward its vision. Like a vision statement, the mission statement provides guidance in developing policies and standards. The mission statement should be a call to action and include some of the goals the organization has about data. Below is an example

Enabling stakeholders to make data-driven decisions by providing accurate, timely data and insights

In the example above, it is clear that accuracy, timeliness, and insights are the goals for achieving the vision statement. In addition, the audience is identified which is the stakeholders within the organization.

Value Proposition

The value proposition provides a justification or the significance of adopting a data governance strategy. Another way to look at this is an emphasis on persuasion. Some of the ideas included in the value proposition are the benefits of implementation. Often the value proposition is written in the form of cause and effect statement(s). Below is an example

By implementing this data governance program we will see the following benefits: 

Improved data quality for actionable insights, increased trust in data for making decisions, and clarity of roles and responsibilities of analysts

In the example above three clear benefits are shared. Succinctly this provides people with the potential outcomes of adopting this strategy. Naturally, it would be beneficial to develop ways to measure these ideas which means that only benefits that can be measured should be a part of the value proposition.

Guiding Principles

Guiding principles define how data should be used and managed. Common principles include transparency, accountability, integrity, and collaboration. These principles are just more concrete information for shaping policies and standards. Below is an example of a guiding principle.

All data will have people assigned to play critical roles in it

The guiding principle above is focused on accountability. Making sure all data has people who are assigned to perform various responsibilities concerning it is important to define and explain.

Roles & Responsibilities

Roles and responsibilities are about explaining the function of the data governance team and the role each person will play. For example, a small organization might have people who adopt more than one role such as being data stewards and custodians while larger organizations might separate these roles.

In addition, it is also important to determine the operating model and whether it will be centralized or decentralized. Determining the operating model again depends on the context and preferences of the organization.

It is also critical to determine how compliance with the policies and standards will be measured. It is not enough to say it, eventually, there needs to be evidence in terms of progress and potential changes that need to be made to the strategy. For example, perhaps a data audit is done monthly or quarterly to assess data quality.

Conclusion

Having a data governance strategy is a crucial step in improving data governance within an organization. Once a plan is in place it is simply a matter of implementation to see if it works.

white dry erase board with red diagram

Data Governance Assessment

Before data governance can begin at an organization it is critical to assess where the organization is currently in terms of data governance. This necessitates the need for a data governance assessment. The assessment helps an organization to figure out where to begin by identifying challenges and prioritizing what needs to be addressed. In particular, it is common for there to be five steps in this process as shown below.

  1. Identify data sources and stakeholders
  2.  Interview stakeholders
  3.  Determine current capabilities
  4.  Document the current state and target state
  5.  Analyze gaps and prioritize

We will look at each of these steps below.

Identify Data Sources and Stakeholders

Step one involves determining what data is used within the organization and the users or stakeholders of this data. Essentially, you are trying to determine…

  • What data is out there?
  •  Who uses it?
  •  Who produces it?
  •  Who protects it?
  •  Who is responsible for it?

Answering these questions also provides insights into what roles in relation to data governance are already being fulfilled at least implicitly and which roles need to be added to the organization. At most organizations at least some of these questions have answers and there are people responsible for many roles. The purpose here is not only to get this information but also to make people aware of the roles they are fulfilling from a data governance perspective.

ad

Interview Stakeholders

Step two involves interviewing stakeholders. Once it is clear who is associated with data in the organization it is time to reach out to these people. You want to develop questions to ask stakeholders in order to inform you about what issues to address in relation to data governance.

An easy way to do this is to develop questions that address the pillars of data governance. The pillars are…

  • Ownership & accountability
  •  Data quality
  •  Data protection and privacy
  •  Data management
  •  Data use

Below are some sample questions based on the pillars above.

  • How do you know your data is of high quality
  •  What needs to be done to improve data quality
  •  How is data protected from misuse and loss
  •  How is metadata handle
  •  What concerns do you have related to data
  •  What policies are there now related to data
  •  What roles are there in relation to data
  •  How is data used here

It may be necessary to address all or some of these pillars when conducting the assessment. The benefit of these pillars is they provide a starting point in which you can shape your own interview questions. In terms of the interview, it is up to each organization to determine what is best for data collection. Maybe a survey works or perhaps semi-structured interviews or focus groups. The actual research part of this process is beyond the scope of this interview.

Determine Current Capabilities

Step three involves determining the current capabilities of the organization in terms of data governance. Often this can be done by looking at the stakeholder interviews and comparing what they said to a rating scale. For example, the DCAM rating scale has six levels of data governance competence as shown below.

  1. Non-initiated-No governance happening
  2.  Conceptual-Aware of data governance and planning
  3.  Developmental-Engaged in developing a plan
  4.  Defined-PLan approved
  5.  Achieved-Plann implemented and enforced
  6.  Enhanced-Plan a part of the culture and updated regularly

Determining the current capabilities is a subjective process. However, it needs to be done in order to determine the next steps in bringing data governance along in an organization.

Document Current State and Target State

Step four involves determining the current state and determining what the target state is. Again, this will be based on what was learned in the stakeholder interviews. What you will do is report what the stakeholders said in the interviews based on the pillars of data governance. It is not necessary to use the pillars but it does provide a convenient way to organize the data without having to develop your own way of classifying the results.

Once the current state is defined it is now time to determine what the organization should be striving for in the future and this is called the target state. The target state is the direction the organization is heading within a given timeframe. It is up to the data governance team to determine this and how it is done will vary. The main point is to make sure not to try and address too many issues at once and save some for the next cycle.

Analyze and Prioritize

The final step is to analyze and prioritize. This step involves performing a gap analysis to determine solutions that will solve the issues found in the previous step. In addition, it is also important to prioritize which gaps to address first.

Another part of this step is sharing recommendations and soliciting feedback. Provide insights into which direction the organization can go to improve its data governance and allow stakeholders to provide feedback in terms of their agreement with the report. Once all this is done the report is completed and documented until the next time this process needs to take place.

Conclusion

The steps presented here are not prescriptive. They are shared as a starting point for an organization’s journey in improving data governance. With experience, each organization will find its own way to support its stakeholders in the management of data.

measurement-millimeter-centimeter-meter-162500.jpeg

Total Data Quality

Total data quality as its name implies is a framework for improving the state of data that is used for research and reporting purposes. The dimensions that are used to assess the quality of data are measurement and representation

Measurement

Measurement is focused on the values gathered on the variable(s) of interest. When assessing measurement researchers are concerned with.

ad
  • Construct-The construct is the definition of the variable of interest. For example, income is can be defined as a person’s gross yearly salary in dollars. However, salary can also be defined as per month or as the net after taxes to show how this construct can be defined differently. The construct validity must also be determined to ensure that it is measuring what it claims to measure.
  •  Field-This is the place where data is collected and how it is collected. For example, our income variable can be collected from students or working adults. Where the data comes from affects the quality of the data concerning the research problem and questions. If the research questions are focused on student income then collecting income data from students ensures quality. In addition, how the data is encoded matters. All student incomes need to be in the same currency in order to make sense for comparision
  •  Data Values-This refers to the tools and procedures for preparing the data for analysis to ensure high-quality values within the data. Such challenges addressed are dealing with missing data, data entry errors, duplications, assumptions for various analytical approaches, and or issues between variables such as high correlations.

Representation

Representation looks at determining if the data collected comes from the population of interest. Several concerns need to be addressed when dealing with representation.

  • Target population- The target population is potential participants in the study. The limitation here is determining the access of the target population. For example, studies involving children can be difficult because of ethical concerns over data collection with children. These ethical concerns limit access at times.
  •  Data sources- Data sources are avenues for obtaining data. It can relate to a location such as a school or to a group of people such as students among other definitions. Once access is established it is necessary to specifically determine where the data will come from.
  •  Missing data-Missing data isn’t just looking at what data is not complete in a dataset. Missing data is also about looking at who was left out of the data collection process. For example, if the target population is women then women should be represented in the data. In addition, missing data can also look at who is represented in the data but should not be. For example, if women are the target population then there should not be any men in the dataset.

Where measurement and representation meet is at the data analysis part of a research project. If the measurement and representation are bad it is already apparent that the data analysis will not yield useful insights. However, if the measurement and representation are perfect but the analysis is poor then you are still left without useful insights.

Conclusion

Measurement and representation are key components of data quality. Researchers need to be aware of these ideas to ensure that they are providing useful results to whatever stakeholders are involved in a study.

photo of assorted acoustic guitars

Data Types

There are many different ways that data can be organized and classified. In this post, we will look at data as it is classified by purpose. Essentially, data can be gathered for non-research or research purposes. Data collected for non-research purposes is called gathered data and data collected for research purposes is called designed data.

Gathered Data

Gathered data is data that is obtained from sources that were not developed with the intention of conducting research specifically. Examples of gathered data would be data found in social media such as Twitter or YouTube and data that is scraped from a website. In each of those examples, data was collected but not necessarily for an empirical theory testing purpose.

Gathered data is also collected in many ways beyond websites. Other modes of data collection could be sensors such as traffic light cameras, transactions such as those at a store, and wearables such as those used during exercise.

ad

Just because the data was not collected for research purposes does not mean that it cannot be used for this purpose. Gathered data is frequently used to support research as it can be analyzed and insights developed from it. The challenge is that the gathered data may not directly address whatever research questions a researcher may have which necessitates using this data as a proxy for a construct or rephrasing research questions to align with what the gathered data can answer. Gathered data is also referred to as big data or organic data.

Designed data

Designed data is data that was developed and collected for a specific research purpose. Often this data is collected from people or establishments for answering scientifically designed research questions. A common way of collecting this form of data is the use of a survey and these surveys can be conducted in-person, online, and or over the phone. These forms of data collection are in contrast to gathered data which collects data passively and without human interaction. This leads to an important distinction in that gathered data is probably strictly quantitative because of its impersonal nature while designed data can be quantitative and or qualitative in nature because it is possible to have a human element in the collection process.

When a researcher wants designed data they will go through the process of conducting research which often includes developing a problem, purpose, research questions, and methodology. All of these steps are commonly involved in conducting research in general. The data that is collected for design purposes is then used to address the research questions of the study.

The purpose of this process is to ensure that the data collected will answer the specific questions the researcher has in mind. In other words, designed data is designed to answer specific research questions while gathered can hopefully answer some questions.

Conclusion

Understanding what data was collected for is beneficial for researchers because it helps them to be aware of the strengths and weaknesses the data may have based on its purpose. Neither gathered nor designed data is superior to the other. Rather, the difference is in what was the inspiration for collecting the data.

two gray bullet security cameras

Data Governance Office

The data governance office or team are the leaders in dealing with data within an organization. This team is comprised of several members such as

  • Chief Data Officer
  •  Data Governance Lead
  •  Data Governance Consultant
  •  Data Quality Analyst

We will look at each of these below. It also needs to be mentioned that a person might be assigned several of these roles which are particularly true in a smaller organization. In addition, it is possible that several people might fulfill one of these roles in a much larger organization as well.

Chief Data Officer

The chief data officer is responsible for shaping the overall data strategy at an organization. The chief data officer also promotes a data-driven culture and pushes for change within the organization. A person in this position also needs to understand the data needs of the organization in order to further the vision of the institution or company.

ad

The role of the chief data officer encompasses all of the other roles that will be discussed. The chief data officer is essentially the leader of the data team and provides help with governance consulting, quality, and analytics. However, the primary role of this position is to see the big picture for big data and to guide the organization in this regard, which implies that technical skills are beneficial but leadership and change promotion is more critical. In sum, this is a challenging position that requires a large amount of experience

Data Governance Lead

The data governance leads primary responsibilities to involve defining policies and data governance frameworks. While the chief data officer is more of an evangelist or promoter of data governance the data governance lead is focused on the actual implementation of change and guiding the organization in this process.

Essentially, the data governance lead is in charge of the day-to-day operation of the data governance team. While the chief data officer may be the dreamer the data governance lead is a steady hand behind the push for change.

Data Governance Consultant

The data governance consultant is the subject matter expert in data governance. Their role is to know all the details of data governance in the general field and even better if they know how to make data governance happen in a particular discipline. For example, a data governance consultant who knows how to make data governance happen within the context of a university in particular.

The data governance consultant supports the data governance lead with implementation. In addition, the consultant is a go-between for the larger organization and IT. Serving as a go-between implies that the consultant is able to effectively communicate with both parties on a technical level with IT and in a layman’s matter with the larger organization. The synergy between IT and the larger organization can be challenging because of communication issues due to vastly different backgrounds and it is the consultant’s responsibility to bridge this gap.

Data Quality Analyst

The data quality analyst’s job is as the name implies to ensure quality data. One way of determining data quality is to develop rules for data entry. For example, a rule for data quality is that marital status can only be single, married, divorced, or widowed. This rule restricts any other option that people may want. When this rule is supported it is an example of high quality within this context.

A data quality analyst also performs troubleshooting or root cause investigations. If something funny is going on in the data such as duplicates, it is the data quality analyst’s job to determine what is causing the problems and to find a solution. Lastly, a data quality analyst is also responsible for statistical work. This can include statistical work that is associated with the work of a data analyst and or statistical work that monitors the use of data and the quality of data within the organization.

Conclusion

The data governance team plays a critical role in supporting the organization with reliable and clean data that can be trusted to make actionable insights. Even though this is a tremendous challenge it is an important function in an organization.

interior of empty parking lot

Roles in Data Governance

Working with data is a team event. Different people are involved in different stages of the data process. The roles described below are roles commonly involved in data governance. The general order below is the common order in which these individuals will work with data. However, life is not always linear and different people may jump in at different times. In addition, one person might have more than one role when working with data in the governance process.

Data Owners

Data owners are responsible for the infrastructure such as the database in which data is stored for consumption and use. Data owners are also in charge of the allocation of resources related to the data. Data owners also play a critical role in developing standard operating procedures and compliance with these standards.

Data Producers

Once the database or whatever tool is used for the data the next role involved is the data producer. Data producers are responsible for creating data. The creation of data can happen through such processes as data entry or data collection. Data producers may also support quality control and general problem-solving of issues related to data. To make it simple the producer uses the system that the owner developed for the data.

ad

Data Engineers

Data engineers are responsible for pipeline development which is moving data from one place to the other for various purposes. Data engineers deal with storage optimization and distribution. Data engineers also support the automation of various tasks. Essentially, engineers move around the data that producers create.

Data Custodians

Data custodians are the keepers and protectors of data. They focus on using the storage created by the data owner and the delivery of data like the data engineer. The difference is that the data custodian sends data to the people after them in this process such as stewards and analysts.

Data custodians also make sure to secure and back up the data. Lastly, data custodians are often responsible for network management.

Data Stewards

Data stewards work on defining and organizing data. These tasks might involve working with metadata in particular. Data students also serve as gatekeepers to the data which involves keeping track of who is using and accessing the data. Lastly, data stewards help consumers (analysts and scientists) find the data that they may need to complete a project.

Data Analysts

Data analysts as the name implies analyze the data. Their job can involve statistical modeling of data to make a historical analysis of what happened in the past. Data analysts are also responsible for cleaning data for analysis. In addition, data analysts are primarily responsible for data visualization and storytelling development of data. Dashboards and reports are also frequently developed by the data analyst.

Data Scientists

The role of a data scientist is highly similar to data analyst. The main difference is that data scientists use data to predict the future while data analysts use data to explain the past. In addition, data scientists serve as research designers to acquire additional data for the goals of a project. Lastly, data scientists do advance statistical work involving at times machine learning, artificial intelligence, and data mining.

Conclusion

The roles mentioned above all play a critical role in supporting data within an organization. When everybody plays their part well organizations can have much more confidence in the decisions they make based on the data that they have.

Tips for Lecturing VIDEO

Lecturing is a commonly used vehicle for instruction. The video below will provide tips for how to improve this method of instruction for students.

ad
person holding white and black frame

Data Governance Framework Types and Principles

When it is time to develop data governance policies the first thing to consider is how the team views data governance. In this post, we will look at various data governance frameworks and principles to keep in mind when employing a data governance framework.

Top-Down

The top-down framework involves a small group of data providers. These data providers serve as gatekeepers for data that is used in the institution. Whatever data is used is controlled centrally in this framework.

ad

One obvious benefit of this approach is that with a small group of people in charge, decision-making should be fast and relatively efficient. In addition, if something does go wrong it should be easy to trace the source of the problem. However, a top-down approach only works in situations that have small amounts of data or end users. When the amount of data becomes too large the small team will struggle to support users which indicates that this approach is hard to scale. Lastly, people may resent having to abide by rules that are handed down from above.

Bottom-Up

The bottom-up approach to data governance is the mirror opposite of the top-down approach. Where top-down involves a handful of decision-makers bottom-up focus is on a democratic style of data leadership. Bottom-up is scaleable due to everyone being involved in the process while top-down does not scale well. Generally, controls and restrictions on data are put in place after the raw data is shared rather than before when the bottom-up approach is used.

Like all approaches to data governance, there are concerns with the bottom-up approach. For example, it becomes harder to control the data when people are allowed to use raw data that has not been prepared for use. In addition, because of the democratic nature of the bottom-up approach, there is also an increased risk of security concerns because of the increased freedom people have.

Collaborative

The collaborative approach is a mix of top-down and bottom-up ideas on data governance. This approach is flexible and balanced while placing an emphasis on collaboration. The collaboration can be among stakeholders or between the gatekeepers and the users of data.

One main concern with this approach is that it can become messy and difficult to execute if principles and goals are not clearly defined. There it is important to spend a large amount of time in planning when choosing this approach.

Principles

Regardless of which framework you pick when beginning data governance. There are also several terms you need to be familiar with to help you be successful. For example, integrity involves maintaining open lines of communication and the sharing of problems so that an atmosphere of trust is maintained or developed.

It is also important to determine ownership for the purpose of governance and decision-making. Determining ownership also helps to find gaps in accountability and responsibility for data.

Leaders in data governance must also be aware of change and risk management. Change management is tools and process for communicating new strategies and policies related to data governance. Change management helps with ensuring a smooth transition from one state of equilibrium to another. Risk management is tools related to auditing and developing interventions for non-compliance.

A final concept to be aware of is strategic alignment. The goals and purpose of data governance must align with the goals of the organization that data governance is supporting. For example, a school will have a strict stance on protecting student privacy. Therefore, data governance needs to reflect this and support strict privacy policies

Conclusion

Frameworks provide a foundation on which your team can shape their policies for data governance. Each framework has its strengths and weaknesses but the point is to be aware of the basic ways that you can at least begin the process of forming policies and strategies for governing data at an organization.

white paper with note

Data Governance Framework

In this post we will look at a defining data governance framework. We will also look a the key components that are a part of a data governance framework.

Defined

A data governance framework is the how or the plan for governing the data within an organization. The term data governance determines what needs to be governed or controlled while the data governance framework is the actual plan for controlling the data.

Common Components

There are several common components of a data governance plan and they include the following.

  • Strategy
  •  Policies
  •  Processes
  •  Coordination
  •  Monitoring/communication
  •  Data literacy/culture

Strategy involves determining how data can be used to solve problems. This may seem pointless but certain data can be used to solve certain problems. For example, customers’ addresses in California might not be appropriate for determining revenue generated in Texas. When data is looked at strategically it helps to ensure that it is viewed as an asset in many cases by those who use it.

ad

Policies help to guide such things as decision-making and expectations concerning data. In addition, policies also help with determining responsibilities and tasks related to data management. One example of policy in action is the development of standards which are rules for best practices in order to meet a policy. A policy may be something like protecting privacy. A standard to meet this policy would be to ensure that data is encrypted and password protected.

Process and technology involve steps for monitoring the quality of data. Other topics related to process can include dealing with metadata and data management. The proper process mainly helps with efficiency in the organization.

Coordination involves the processes of working together. Coordination can involve defining the roles and responsibilities for a complex process that requires collaboration with data. In other words, coordination is developed when multiple parties are involved with a complex task.

Progress monitoring involves the development of KPIs to make sure that the performance expectations are measured and adhered to. Progress monitoring can also involve issues related to privacy, quality, and compliance. An example of progress monitoring may be requiring everyone to change their password every 90 days. At the end of the 90 days, the system will automatically make the user create a new password.

Lastly, data literacy and culture involve training and developing the skill of analyzing and or communicating data to people and others within the organization of use or consumption data. Naturally, this is an ongoing process and how it works depends on who is involved.

Conclusion

A framework is a plan for achieving a particular goal or vision. As organizations work with data, they must be diligent in making sure that the data that is used is trustworthy and protected. A data governance framework is one way in which these goals can be attained.