Category Archives: Data Governance

De-Identification of Data

Advertisements

Removing identifying information in data is a self-explanatory term. The purpose of removing identification from data is to protect the people who the data came from. These people can be customers, employees, or other groups for which data has been collected. De-identification can also be performed for compliance reasons and or as a security measure.

People who are responsible for privacy and or data governance in their organization need to be familiar with ways to de-identify data. Therefore, in this post, we will look at two commonly used techniques for removing identification from data. These two methods are.

Pseudonymization
Anonymization

Pseudonymization

A pseudonym is a false name. Therefore, in the context of data, pseudonymization is the process of giving false names to data that can help identify somebody. It is similar to having a secret identity in the superhero world. For example, Peter Parker and Spider-Man are the same person but most people do not know this because of the use of a false name.

Practical ways to achieve pseudonymization with data can include changing text to numbers such as names. Removing information such as date of birth and or removing parts of data in a column such as keeping only the last four digits of a person’s social security number.

One advantage, or perhaps disadvantage, of pseudonymization is that the data can be returned to its original state. This is because whoever altered the data used the same rules for every change they made. The downside to this is if someone else can determine how the data was altered it would allow them to see the original data which could be used to identify someone.

Anonymization

Anonymous means no name. Therefore, anonymization is the process of removing all personal identifying information in a dataset. When this is done the process is not reversible and thus there is no way to determine the identity of the people in the dataset.

An example of anonymization would be to completely remove the names of people in a dataset along with other information such as date of birth and the total removal of phone numbers. Anonymization provides heightened protection but at the loss that even the people who anonymized the data have no idea who the original people are. Whether this is good or bad depends on the context in which the data will be used.

There are industry-specific ways of achieving either pseudonymization or anonymization. Examples include the fields of health care and education. However, at the macro level, all industries are using some combination of pseudonymization and anonymization.

Conclusion

Data privacy is a major concern in the world today. The concern with privacy needs to also be balanced with the need to analyze data for insights. For this reason, many have turned to various ways to de-identify data to support the conflicting concerns of privacy with analysis.

Security Models

Leave a reply

Advertisements

Protecting data is a major concern of organizations today. With so many people sharing so much about themselves online organizations must be careful and aware of ways to secure the data that they have. In this post, we will look at two different security models that are commonly deployed today. These two models are the CIA Triad and the DIE model. Either of these models is commonly used when developing a data governance plan for an organization.

CIA Triad

There are several different models used by organizations to examine data privacy. One example is the CIA triad. The CIA triad provides 3 concepts that must be kept in mind when attempting to protect the privacy of users.

“C” stands for confidentiality, in other words, organizations must be sure that the data they have cannot be accessed by others. The “I” stands for integrity. Integrity involves ensuring that data is not altered or changed without authorization. If the data is manipulated without user knowledge any insights derived from the data would be considered questionable.

The last letter in the CIA triad is “A.” The letter “A” stands for availability. Availability means that the data system is operational and can access the data. In other words, the security system cannot be so complex that nobody can get the data that is being protected.

DIE Model

Another security model that is commonly is the DIE model. DIE stands for distributed, immutable, and ephemeral. Distributed means that data should not be limited to one source in case of failures. For example, having multiple copies of data in multiple sources.

The “I” in DIE stands for immutable. Immutable in this context means that the infrastructure being used is replaceable without data loss whenever there is a problem. Again this relates to the idea of having multiple sources of the same data. Lastly, the “E” in DIE stands for ephemeral. Ephemeral means that if there is a data problem that it does not take a long time to get back up and running in the event of a data failure or breach.

Compare and Contrast

There are some similarities and differences between the CIA triad and DIE. Both are focused on data being available. For the CIA this is the “A” and for DIE this is the “I.” In addition, both models are focused on protecting data in terms of preventing changes and this is covered in the letter “I” in both models.

However, there are also some differences. The DIE model is considered much more scalable than the CIA triad. As such, smaller organizations may lean towards the CIA triad while larger organizations may lean towards the DIE model. Furthermore, DIE is focused on hardware and infrastructure while CIA is more data-focused.

Conclusion

Every model has its strengths and weaknesses. The best model depends on the needs of the organization. In either case, the CIA Triad or the DIE model can guide an organization that is looking for a roadmap for securing its data.

Data Classification

Leave a reply

Advertisements

Data classification is a critical part of many company’s strategy for protecting data. In this post, we will look at data classification in terms of its purpose, types, and steps for the implementation of this process.

Common Purposes

The main reason for data classification is to ensure confidentiality. Many data systems have personally identifiable information such as credit cards, social security numbers, and more. Such information needs to be protected and the only way to know it needs to be protected is through classifying it as something that must be shielded.

Availability is another reason for data classification. Through classifying data, it helps a data governance team to know who should have access to what kinds of data. For example, the manager may have full access to all data while the assistant may only have access to data that is not considered confidential. Classification helps in determining access to data.

Data integrity is yet another reason. By ensuring that the data represent what it claims to be assessing what the data stands for. If data is classified as sensitive but does not contain any sensitive information it indicates a problem.

Data Types

There are also several different ways data can be classified. Data can be public which is generally not protected as it is accessible to all for the most part. Data can also be personal which is data that can be used to identify individuals and is usually strictly protected. Data can also be classified as sensitive which means data that requires access authorization.

Lastly, there is confidential information which is data that may have legal restrictions associated with it. The examples above are common forms of data classification. Individual organizations may use all or some of these classifications. In addition, there is nothing to stop an organization from creating its own distinct categories.

Steps

The process of classifying data is rather simple. First, you need to gather all the information that is needed to classify data. Part of this process is supported by having a data catalog that provides information on the location, owners, and content of the data asset.

Once it is clear what data is going to be classified, step two involves the development of a framework. This framework provides the structure for determining how to classify the data. The team involved in this process must develop the criteria for determining which category to place data in. When the categories are developed the data will be tagged. Once this is down the process can be automated using software.

Step three involves making sure the rules developed in step 2 are consistent with the standards that have been developed in the data governance policy. In other words, the classification must not violate the data governance policy because of compliance issues. There must be administrative consistency between data classification rules and the data governance policy

Step 4 involves the application of the rules developed in Step 2. Once this is completed the data classification is over at that moment.

Conclusion

Data classification is another tool that can be used to support an organization. This tool in particular is useful in protecting data based on its characteristics. Therefore, when it’s time to protect data a data classification can help you to determine what data to protect.

Data Governance Policy

Leave a reply

Advertisements

A data governance policy is a set of guidelines that allows an organization to manage its data consistently and properly. What is contained within this policy will vary from one organization to another but some of the topics addressed include data quality, access, usage, integration, and security.

The topics listed above are included in a data governance policy because they relate to the topic of managing data. If a data governance team ignores data quality, access, security, etc. It could have negative ramifications for the organization.

Components

The topics of a data governance policy are described above. However, we will not look at the structure of a data governance policy. Generally, the following components are used in a data governance policy.

Statement of purpose-The goal of the document
Scope and goals-What are and what are not covered in the document along with a breakdown of core beliefs about what data governance should do.
Roles & responsibilities-Who is in charge of what
Principles and rules-THese are a further breakdown of goals into observable behaviors that are called rules. Goals and principles are highly similar and it might be too confusing to have both. Therefore, consider choosing one or the other.
Definitions of terms-It is important to define keywords that will be frequently used on the document. The level of detail depends on the audience.

These are some of the main components of a data governance policy. However, what to include in such a document depends on the local context and challenges of an organization. Some or all of these pieces may be needed or other components not mentioned may be appropriate.

Process

The steps for making a data governance policy are as follows

Make an inventory of the data that will be covered under the data governance policy. Most of the time, not all data in an organization is under the policy.
Build a team that includes a leader along with other stakeholders of the data.
Define scope and goals
Assign roles and responsibilities
Develop standards
Define metrics. Metrics help you to determine if you are achieving your goals.
Make a draft of your document and revised it as needed.

As you can see, the process is similar to the components. The order of developing the components matters as it is better to build broader policies before focusing on behavioral objectives.

Conclusion

A major step in the development of a data governance program is the development of a data governance policy. The policy allows a data governance team lay out what they are trying to do and how they will do it. Such a document is critical to helping the team to stay on the same page and to consistently seek the same. objectives.

Master Data Management

Leave a reply

Advertisements

Data continues to become more and more important. With this growth, there has been a corresponding need for standardizing and managing. In this post, we will look at master data and how one can go about managing it.

Definition

Master data is a uniform set of data that is used throughout an organization. By uniform, it means that this data is exactly the same wherever it appears in any data set. This is highly important because it is natural for data to change a little as people use it or if it is merged and edited in various stages of the workflow. Master data is so important and fundamental that it must remain unchanged for the sake of consistency when different departments within an organization need to integrate data.

Therefore, master data management is the process of protecting master data from the changes that can happen from people and systems interacting with data. Unfortunately, preserving data is not an easy task and at times this can be complex and difficult.

Master Data Management Forms

There are different forms or ways to develop master data. The analytical approach feeds whatever master data an organization has into a data warehouse where it can be referred to as needed. The operational approach involves master data in the core business or organizational systems. Essentially, the difference between these two approaches is at what level of granularity they are implemented. Analytical is across an organization while operational is within a sub-unit of the organization.

Whichever method is used there are several ways that the approach is implemented. A registry process involves creating a unified master data source with making any changes to local systems. This means there are two different systems which mean that people need to be aware of when to refer to the registry.

Consolidation is another way and involves updating the registry master data whenever the local system is updated. Lastly, the transaction method is the opposite and involves the local system being updated whenever the registry is.

Steps

The steps to selecting and standardizing master data are explained below Step one involves selecting what is considered master data. This will vary from organization to organization and will involve some disagreeing and negotiation. The same applies to step two which is agreed on data standards and the master data approach. Examples of things that involve data standards can include capitalization of text, number of decimals, number of digits, maximum text length, abbreviations, etc. All these must be worked out together. Should states be abbreviated or spelled out fully? Should phone numbers have dashes in them? These are just some of the challenges to address.

Step three involves deploying the software to find and standardize the master data. This can be done manually and this happens in smaller organizations but for larger organizations, this is the only practical way to do this. Step 5 is the cleansing of the data which can include dealing with duplicates. Once all of this is completed it is now appropriate to use the master data.

The Team

Most projects require a team effort and master data management is no exception. Often you will want a manager who oversees the project. Another person who may be involved is a master data specialist who maintains the system. Data stewards are generally involved as they are the ones most familiar with whatever data they are responsible for. In addition, you may need leadership sponsors and stakeholders involved as well particularly when picking master data and assigning data standards.

Conclusion

Master data is a critical component of many organizations which means it must be managed and controlled as well. Some practical ways to address this have been shared here. However, the best way to approach this will vary from one organization to another.

Data Protection Impact Assessment

Leave a reply

Advertisements

The data protection impact assessment (DPIA) is a tool associated with GPDR that is used to determine the level of protection a data needs within an organization. Protection is determined by finding potential risks that might negatively affect data within the organization. In this post we will look at the benefits of conducting a DPIA, assessing when to conduct an assessment, and a brief look at the process for completing a DPIA.

Benefits

As mentioned earlier, conducting a DPIA allows an organization to document risk. Documenting risk allows for strategies to be developed to reduce the said risk. Other benefits include allowing an organization to assess the cost or level of a particular risk. Lastly, a DPIA can provide unique insights into specific data protection needs and risks.

In general, the DPIA provides the initial data needed to develop a roadmap for supporting data protection within an institution. As such, this is a critical first step in a complex process.

When to do DPIA

Considering the importance of conducting a DPIA a natural question to consider is when should such an assessment be performed. There are several situations that warrant a DPIA. One example is whenever an organization is moving to some form of auto processing such as a program that identifies at-risk students. Since this system is automated it is important to make sure the data is protected.

Another situation that may warrant a DPIA is a situation in which individuals are judged and or evaluated. For example, collecting what users watch on Youtube to make recommendations. Lastly, instances of data integration may require a DPIA to make sure there is no loss of protection from combining data.

Process

There are several steps to actually completing a DPIA. Step one often involves describing the data flow. By data flow, it is meant how data movies throughout the organization in terms of its collection, storage, as well as sources. Step two involves determining the scope of the data. Scope is referring to what types of data will be assessed, the amount of data to be assessed, and or how long will the data be stored.

Step three involves defining the benefits of data processing. Data processing is the cleansing of data so that it can be used for analysis. How this is done varies wildly and depends on the situation. Step four looks at how processing affects the consumer. Explaining this is difficult but for example, complex data processing could slow down the user experience.

Steps 5 and 6 involve talking to stakeholders about this new project and checking for compliance. Stakeholders will explain any concerns that they may have while compliance involves legal matters such as regulations and laws.

Steps 7 and 8 are where various risks are identified and solutions are proposed. For example, if it is discovered that some of the data is revealing people’s identities it might be appropriate to make the data anonymous. Once all of the problems and solutions are developed, step 9 is the official approval of the DPIA.

Conclusion

Completing a data protection impact assessment is a practical way to take the first steps in data privacy in an organization. With the insights developed an organization can inspire confidence in their stakeholders that the data within the organization is not only accurate but safe as well.

Data Privacy Ideas to Use

Leave a reply

Advertisements

In this post, we will look at some ideas and tools to keep in mind when addressing data privacy issues.

Data Concerns

If an organization needs to gather and collect data from customers and or stakeholders several concerns need to be addressed. For example, the organization must develop a privacy policy that explains how data is collected, and its legal ramifications, identifies who the data is shared with and how, and explains how an individual can opt out of this process. Some experts also recommend a cookie policy but that relates primarily to organizations that solicit data from individuals who visit the organization’s website.

Once the privacy and or cookie policies are developed they need to be published on the website. Publishing the policies helps with informing consent and allows individuals to decline participation in sharing their data with the organization. The policy also needs to include a contingency plan for data breaches.

Obligation Management & Data Collection

When dealing with data, an organization must also know what data is being collected, how this data is collected, and as was already mentioned how consent can be given or revoked. A privacy team must know how and where data came from to set in place proper procedures for governing this data.

There are two main ways that data is collected and that is directly and indirectly. Direct data collection is a request that the organization makes that a person complies with. For example, when entering a website to purchase something it is common to have to supply an address and credit card information.

Indirect data collection is data that is collected with a direct request to the individual. For example, many websites have cookies and track IP addresses to determine the person’s location. Many people provide this information without being aware of it.

Data Movement

Data movement addresses many of the same ideas already discussed. In general, several key questions must be answered to determine how data moves within and out of an organization. For example, it is important to know how data was collected, what data was collected, why it was collected, how it will be stored, how it will be shared, and if necessary, how it will be destroyed.

Again most of these questions have been addressed but the main difference is for what purpose. Data movement can be used to track the journey of data through an organization in a way that is beneficial for data lineage.

Acronym

Many of the ideas expressed in this post can be captured in the acronym PREACH, which is listed below

P (purpose)-What is the reason for asking for data

R (Right to change)-Can changes be made to the data on request

E (Easy to understand)-Are the policies for data comprehensible

A (Alerting)-Will a person be alerted if there is a problem with their data

C (Consent)-Do people give permission for their data to be used

H (How)-How will the data be used

Conclusion

There will always be challenges with managing the privacy of data. Despite this, there are several ideas to keep in mind when trying to protect user data. The ideas presented here provide a baseline for privacy leaders.

Data Privacy Implementation Strategies

Leave a reply

Advertisements

Data privacy is a topic that many organizations are addressing. In this post, we will go through several steps that must be taken to implement a data privacy program.

Leadership Sponsor

As with any major initiative, data privacy is going to need the support of leadership. In particular, there will be a need for an advocate on the leadership team who will support the vision of improving data privacy. Who this person is will naturally vary from organization to organization.

The sponsor is not only an advocate but also serves as a medium of communication between the data privacy team and leadership. The sponsor serves as the eyes and ears for the privacy team to help them to avoid pitfalls is deal with concerns that are not shared directly from the leadership team to the privacy team.

Put Someone in Charge

Implementing any program or strategy requires that someone take the lead. Therefore, when it is time to develop a privacy approach someone needs to be in charge. The selection of the leader will naturally vary from one place to the other. The point is that the leadership sponsor needs someone they can talk to directly about the challenges and concerns that may be made at the leadership level.

Depending on the size of the project there might be more than one person identified as a leader. However, it is generally wiser to start small and scale as appropriate.

Examine the Data

Before any action can take place it is important to take an inventory of available data. Another name for this is the compiling of a data catalog. A privacy leader must know what data needs to be held private. Without this information, it is hard to ensure the quality.

Knowing the data works in combination with the policies and procedures that need to be made. For example, if the data includes personal information this will influence how privacy is maintained versus data that does not contain such information.

Compliance Expectations

Knowledge of the data is used concerning compliance expectations. For a corporation, the compliance standard might be GPDR. For other organizations, compliance might be determined by local laws or organizational standards.

Generally, a privacy team must provide evidence that they are implementing and or obeying compliance standards. Therefore, a team might have to document and archive how they comply with regulations in the event of a data breach and or audit.

Assess Risk

Assessing risk helps to inform the privacy team in terms of what sort of policy and or procedures to implement. Fortunately, it is not necessary to develop this risk assessment in a vacuum. There are risk assessment frameworks such as ISO 31000 or ISO 27005. Either of these frameworks or others can help you to determine the level of danger your data is potentially facing.

Create Policies and Procedures

Policies are broad guidelines based on the context in which it is being developed for. Most websites have some sort of privacy policy that explains how and what data is collected along with its purpose. Privacy policies can include an idea of the roles and responsibilities of the data privacy team as well.

Procedures are the steps that need to be taken to fulfill the policies that were created. In other words, data procedures provide step-by-step guidance of policies. For example, if the policy speaks about the importance of only certain people having access to data a procedure for this might be how to set up a password or to seek permission to access a particular database. Essentially, policies inspire procedures.

Controls

Controls are inspired by risk assessment. In this step, you are implementing ways to mitigate risk to data. For example, it might have been uncovered that sensitive data is too easy to access. The control for this example may be to move the data to more secure data or to ensure that the data is password protected.

The main point here is that all of these measures must be integrated and working together. The data catalog and knowledge of compliance inspire the policies and procedures which in turn helps with the development of controls

Training & Monitoring

Now that almost everything is in place it is time to train people on the new privacy rules. The training will be context specific but is critical for getting buy-in to the new system. Without the cooperation of the masses, there is no hope for the success of the program.

After training, the training is assessed through monitoring. Monitoring assesses how well the program is running. It deals with such challenges as whether people are obeying the new procedures that have been implemented. Monitoring also helps in providing feedback in terms of where there might be growth opportunities. No system is perfect and monitoring provides critical information to strengthen the program.

Conclusion

Data privacy can be improved in any organization. The ideas presented here provide information on how to start a data privacy program. Naturally, all of these steps may not work for each organization but many valuable ideas have been shared to support the protection of privacy.

Privacy by Design

Leave a reply

Advertisements

Privacy by Design is an idea found within the General Data Protection Regulation, which affects the data privacy practices of organizations. In this post, we will define this term and explain several principles of privacy by design.

Definition

Privacy by design is a concept in which data protection happens through the appropriate development of technology. Essentially, data protection should not be limited to one place or one feature instead data protection should be layered throughout the system of an organization.

There are several ways to begin this initiative. A common method is to have a privacy policy that is up-to-date and readable. Another way to begin this process is to establish someone as the data protection officer. Lastly, it is also common to conduct some sort of assessment of data protection to determine areas of improvement before using an individual’s personal data.

Principles

There are seven principles of privacy by design. Below is a list with explanations.

Proactive rather than reactive-There should be an effort to prevent privacy loss rather than trying to fix a situation in which people’s personal information is inappropriately accessed.
Privacy by default-Maintaining the privacy of data should be the first thing an organization thinks about and can include restricting use/access, and or deleting data that is no longer needed.
Embedding of privacy-EMbedding involves such tools as encryption, authentication, and the testing of vulnerabilities. In other words, privacy is used as a foundational aspect of developing a website or application.
Full functionality-This idea is a reminder that data privacy should not make it difficult to use a website or application. Protect data but avoid sacrificing the user experience.
End-to-end security-This is similar to principle number two and is essentially a reminder that privacy protection must be comprehensive from the time the data is received until the data is destroyed.
Visibility and transparency-People should know what is being done with the data an organization has of them.
Respect for user privacy-People should still have authority over their data after it is collected. What this means is that they can grant or rescind consent to their data at any time.

Implementation Perspective

There are several perspectives from which the implementation of privacy by design that must be considered and these are systems, processes, and risk management perspectives.

The system perspective involves documenting the organization’s commitment to data protection, appointing a data protection officer or leader, providing training for employees, checking security measures, developing a record-keeping system, and conducting a self-assessment. All of these steps are used to develop an initial system for data privacy.

For processes, it is necessary to determine roles within privacy such as people in IT, legal, etc. who support privacy with their technical expertise. It is also important to document the data processing process and privacy risks. Privacy controls for users and the implementation of security measures from the systems perspective are critical as well.

Risk management is another key perspective that needs to be addressed for data privacy. Risk management involves the legal purpose of processing data. It also includes tracking who has access to data, controls for accessing data, what to do in the event of a breach, and minimization, anonymization, and pseudonymization of data. Lastly, measures for data accuracy are developed here.

Data Catalogs and Data Silos

Leave a reply

Advertisements

Data catalogs and data silos are two ideas that are commonly associated with data governance. In this post, we will look at these two terms by defining them and share either how to implement them or prevent them.

Data Catalogs

Data catalogs are a rather recent phenomenon. They were first developed in the 2010’s with the exact origins not defined. A data catalog is a reference application that contains metadata on the various datasets within an organization. Usually, this document is in a searchable format so that people can find datasets they may need within an organization.

The data catalog essentially tracks available data within an organization. The main reason for tracking data is to prevent loss and or secret data. Within a data governance framework, data is considered an asset. Therefore, just as an organization prevents the loss of inventory because of its monetary potential the data catalog prevents the monetary decision-making loss of data within an organization.

Tips

There are also several tips for developing and using data catalogs. For example, a data catalog should track the roles of various people concerning individual datasets. Roles can include who is the owner of the data, the steward, the custodian, etc. Tracking roles helps in assigning responsibility for data.

Another tip is to develop data dictionaries concerning the data catalog. Data dictionaries contain metadata not from all data but just from one dataset. An analogy would be maps. Some maps cover the whole world like a data catalog while other maps only cover a city or county like a data dictionary. The data dictionary is useful one an analyst needs more information when preparing to use data.

It is also important to make the data catalog user-friendly. Making a data catalog user-friendly for stakeholders involves the support of IT with a strong concern for the user experience. Nobody will use a data catalog if its user interface is useless. However, the solution to this would be lots and lots of training

Data Silos

Data catalogs help to prevent what are called data silos. Data silos are sources of data that are controlled in an isolated place within an organization. When silos are developed it can lead to analyses that are incomplete because of incomplete data. In multiplication, silos can lead to a breakdown in collaboration which can cause duplication of efforts and reduced productivity. Lastly, people may also struggle within an organization to find data that is needed for analysis.

Data silos are often developed in organizations that have a decentralized IT strategy. A decentralized approach frequently leads to every department doing what they want in terms of data storage and technology utilization which is chaotic. Other motivations for data silos can include a lack of common goals when it comes to data management. No goals means everyone does what they want.

Breaking Silos

Two main ways of breaking data silos are the development of data governance and data integration. One step in data governance is developing a data catalog as mentioned early. Once a data catalog is developed the team can start to create policies and standards in data governance to establish expectations regarding data use and storage.

A second strategy that is related to the first is data integration. Data integration is the processing of combining data from different tables into one. Upon completing this more analysis can take place. Combining data makes it hard to isolate because data must be available for use.

Conclusion

Data catalogs and silos are a part of the daily life of the information professional. Therefore, in the context of data governance, it is important to be familiar with these two terms so that support can be provided.

Data Privacy

Leave a reply

Advertisements

A field closely related to data governance is data privacy. In this post, we will look at what data privacy is as well as principles that need to be kept in mind when trying to keep people’s data private.

Data Privacy

Privacy is a term that is difficult to define. For our purposes, data privacy is the amount of control a person has over personal information in terms of how this information is collected, managed, and stored. This definition gives the impression that people have little data privacy because we are so often compelled to share our information online.

Websites often require some surrendering of personally identifiable information (PII) such as name, address, phone number, etc while in the medical field, there is demand for personal health information (PHI). Sharing information about yourself can be frustrating for many but is the cost of doing business online. Naturally, once these various online companies have your data they must be sure to protect it.

Data security is not about collecting or managing data. Rather, data security is focused on the protection of data from unauthorized access. Securing data is critical to protect individuals and organizations from harm because of security breaches. For example, there can be serious financial repercussions if someone’s credit card number is stolen online.

Fair Information Practice Principles

With all the concerns regarding data privacy, it was natural that frameworks would be developed to help organizations with data privacy. One such framework is the Fair Information Practice Principles (FIPPs) developed by the Organization of Economic Development back in the early 1980s. Below are the eight principles in this framework.

Limits on data collections-Every organization need to determine the smallest amount of data they can connect while still maintaining success
Data quality-Data that is collected needs to be accurate and pertinent to the purposes of the organization.
Purpose determination-There must be a clear compelling reason to collect data.
Limits of use-Personal data must only be used for its intended purpose.
Security-Data must be protected
Transparency-People should know that their data is being collected
Individual participation-People whose data has been collected have the right to access their data, have it corrected, and or erased
Accountability-Whoever collects this data is responsible for adhering to the principles listed above

The principles shared above have been adopted by many organizations to provide a foundation on which they can develop their own data privacy policies and philosophy.

Conclusion

Data privacy is a major concern in the world today. Organizations whether online or offline continue to demand more information about their customers. As such, this implies that there must be safeguards in place to ensure the protection of this information.

Defense & Offense with Data

Leave a reply

Advertisements

Within the field of data governance, there are different ways of approaching data and the definition of truth. In this post, we will look at different approaches to data and also how truth can be defined with a data governance framework.

Defense

A defense approach to data is focused on controlling data. This can involve security and stringent governance of data through a highly centralized setting. In addition, the defensive data approach is concerned with minimizing risk and ensuring compliance with standards and expectations. Preventing theft and tracking the flow of data through an organization is also important.

When analytics are used they are used to detect fraud and unusual activity. How defensive an organization is depends on the field or industry. For example, banking and health care are highly defensive due to the type of data they gather.

Offense

An offensive approach to data is focused on developing insights with data. The goal is not to protect but to develop insights for decision-making. An offensive approach to data is characterized by flexibility and being focused on the customer. This style of approaching data is generally emphasizing a decentralized style of data governance.

Organizations that find themselves in highly competitive environments often are forced to become more offensive as they search for insights to maximize profits. How much offensive and defensive an organization needs does vary. However, in general, most if not all organizations start defensive and slowly become more offensive in nature.

Truth

Whether the approach to data is offensive or defensive it is important to determine what is the truth when it comes to data in an organization. Every organization needs a single source of truth (SSOT) for critical data. The SSOT is language used within data that is the same across an organization. For example, sometimes the same name can be entered in multiple different ways in an organization’s data. Take the company AT&T as an example it could be entered in some of the following ways

ATT

att

Att

AT and T

AT&T

Each of the examples above can be considered different and can lead to chaos when it is time to analyze data for insights. This is because redundant names can lead to redundant costs. For example, if AT&T was a vendor for our fictitious company there might be several different contracts with AT&T with several different divisions who all spell AT&T differently. To prevent this the SSOT will define the one way to code AT&T into the system and determine what it represents.

However, keeping the offensive approach to data in mind. There are times for the purpose of analysis that the SSOT can be modified. Doing this leads to what is called multiple versions of truth (MVOT). An example of MVOT is a department that classifies our example of AT&T different way from the SSOT. Accounting might see AT&T as a vendor while marketing might see AT&T as their internet provider, etc. Since everyone knows what the SSOT is they are aware when they make a MVOT for their distinct purpose.

Conclusion

Each organization needs to decide for themselves what approach to data they want to take. There is no right or wrong way to approach data it really depends on the situation. In addition, every organization needs to determine for itself how they will define truth and there is no single way to do this either. What organizations need to do is address these two topics in a way that is satisfying for them.

Data Governance Methodology

Leave a reply

Advertisements

Data governance is becoming more and more common in today’s world. In this post, we will look at one commonly used process of implementing data governance. The steps are explained below.

Scope & Initiation

The first step in setting up a data governance system is to determine the scope of data governance. By scope, it means how deep and wide the program will be. In other words, you have to determine what will be governed and how thoroughly it will be governed.

It may surprise some that not all data is governed by data governance. For each organization, it will be different but generally, all organizations have data that is excluded from data governance. For example, some organizations will include emails under data governance while others will not. It depends on the situation and there is no single rule.

In addition, it is important to determine how thorough the governance will be. An example of this would be the tolerance for data quality issues. There are times were some data errors are permissible as long as they do not exceed a certain threshold but this also depends on the context

Assess

At the assessment stage, the purpose is to determine an organization’s ability to govern data and be governed by policies. Generally, there are three ways of assessing this and they are measuring the capacity to change, the culture of data use, and the ability to collaborate.

The capacity to change is self-explanatory and is a measure of an organization’s ability to accept new policies such as data governance policies. The data use culture is looking at how an organization uses data at that moment. Lastly, collaboration looks at how well people within the organization can work together. Collaboration is critical because data governance generally affects the entire organization and people from multiple departments must work together.

Vision

The vision is where terms are defined and steps going forward are set. For example, the organization needs to define what data governance is for them. In addition, requirements for doing data governance are also developed.

Vision setting is a theoretical experience and this is often boring for the more practical action-oriented individuals. However, setting the vision sets the tone for the rest of the project. Therefore, this must be planned and developed.

Align & Business Value

Aligning and business value is for determining the financial value of incorporating data governance into an organization and also refining how things will be measured. For profit-seeking organizations business value is critical. Most projects need to make or at least save money in this setting. For non-profit organizations, the motivation might be to increase efficiency or the ability to better serve stakeholders.

It’s not enough to talk about savings. Evidence must be provided for determining actual savings. This is where metrics come into play. There must be ways to measure the value of a data governance project. Again, how to do this will vary from place to place but it needs to be addressed.

Functional Design

Functional design is focused on the actual process of doing data governance. What will be done must be determined as well as established roles that support this process as well. Principles are often developed at this step and principles are similar to goals in terms of what is expected from implementing data governance. Following principles, the next thing that is developed are standards which are similar objectives in education in which you have some sort of measurable action.

Best practices often encourage data governance to be embedded within existing roles and responsibilities. In other words, setting up another department within an organization and calling it data governance is generally not considered the best way to make this happen.

Governing Framework Design

Once the plan has been developed it is time to find the people who will implement it. governing framework involves assigning processes to people and setting up the various roles associated with data governance. Generally. a lot of the aspects of data governance are being done at an organization but in a disjointed unaware way. Therefore, the main benefit here is not so much to give out more work but rather to make it clear who is already doing what and make sure they are aware of it.

Road Map

The road map step involves data governance going live. This is the point where data governance is integrated into the existing organization. Other things that are done at this step are designing metrics and reporting requirements. In other words, how good or bad does performance have to be on a standard and how will this be reported?

Change management is also addressed here and involves dealing with resistance and making sure that the scope and or goals of the project do not change. There are times when a project will wander from its original purpose which can be frustrating for people.

Rollout and Sustain

Roll out and sustain involves executing the plan and checking its effectiveness. Essentially, this step involves monitoring the data governance implementation and making corrections as necessary.

Conclusion

Data governance is a critical part of most organizations today. However, it can be tricky to figure out how to make this a part of an organization. The information above provides an example of how this could be done.

Terms Related to Data Storage

Leave a reply

Advertisements

There are several different terms used when referring to data within an organization that can become confusing for people who are not experts in this field. In this post, we will look at various terms that are often misused in the field of data management.

Database

Databases are for structured data which is data that has rows and columns. Among the many benefits of using a database over an Excel spreadsheet is that databases can hold almost limitless amounts of data. In addition, databases can have multiple users querying and inputting data at the same time which is not possible with a spreadsheet.

Data Warehouse

A data warehouse is a computer system designed to store and analyze large amounts of data for an organization. The data for a data warehouse can come from various areas within the organization. Since the data comes from many different places it also helps to integrate data for the purpose of analysis which is valuable for decision-making and insights.

Data warehouses take pressure off databases by providing another location for data. However, because of their size, often over 100 GB, data warehouses are hard to change once they are up and running. Therefore, great care is needed when developing and using this tool.

Data Marts

Data marts are similar to data warehouses with the main difference being the scope. Like data warehouses, data marts are also databases. However, data marts are focused on one subject or department whereas data warehouses gather data from all over an organization. For example, a school might have a data warehouse for all student data while it has a data mart that only holds student classes and grades.

Since they have a focus on a given subject, data marts are generally smaller than data warehouses at less than 100 GB. The rationale of a data mart is that analytic teams can focus when trying to develop insights rather than searching through a larger data warehouse.

Data Lake

Data lakes are also similar to data warehouses. Just like a data warehouse data lakes contain data from all over the organization from many sources. Data lakes are also generally larger than 100 GB. One of the main differences is that data lakes contain structured and unstructured data. Unstructured data is data that does not fit into rows and columns. Examples can include video data, social media, and images.

Another purpose for a data lake is to have a place for keeping data that may not have a specific purpose yet. Another to think of this is to consider a data lake as a historical repository of data. Due to their multipurpose nature, data lakes are often less complex in comparison to data warehouses.

Conclusion

All of the various data products discussed here work together to give an organization access to its data. It is important to understand these different terms because it is common for people to use them interchangeably to the confusion of everyone involved. With consistent terminology, everyone can be on the same page when it comes to delivering value through using data.

Types of Data Quality Rules

Leave a reply

Advertisements

Data quality rules are for protecting data from errors. In this post, we will learn about different data quality rules. In addition, we will look at tools used in connection with data quality rules.

Detective

Detective rules monitor data after it has already moved through a pipeline and is being used by the organization. Detective rules are generally used when the issues that are being detected are not causing a major problem when the issue cannot be solved quickly, and when a limited number of records are affected.

Of course, all of the criteria listed above are relative. In other words, it is up to the organization to determine what thresholds are needed for a data quality rule to be considered a detective rule.

An example of a detective data quality rule may be a student information table that is missing a student’s uniform size. Such information is useful but probably not worthy enough to stop the data from moving to others for use.

Preventative

Preventive data quality rules stop data in the pipeline when issues are found. Preventive rules are used when the data is too important to allow errors, when the problem is easy to fix, and or when the issue is affecting a large number of records. Again, all of these criteria are relative to the organization.

An example of a violation of a data quality prevention rule would be a student records table missing student ID numbers. Generally, such information is needed to identify students and make joins between tables. Therefore, such a problem would need to be fixed immediately.

Thresholds & Anomaly detection

There are several tools for implementing detection and prevention data quality rules. Among the choices are the setting of thresholds and the use of anomaly detection.

Thresholds are actions that are triggered after a certain number of errors occurred. It is totally up to the organization to determine how to set up their thresholds. Common levels include no action, warning, alert, and prevention. Each level must have a minimum number of errors that must occur for this information to be passed on to the user or IT.

To make things more complicated you can tie threshold levels to detective and preventive rules. For example, if a dataset has 5% missing data it might only flag it as a warning threshold. However, if the missing data jumps to 10% it might now be a violation of a preventative rule as the violation has reached the prevention level.

Anomaly detection can be used to find outliers. Unusual records can be flagged for review. For example, a university has an active student who was born in 1920. Such a birthdate is highly unusual and the system should flag it as an outlier by the rule. After reviewing, IT can decide if it is necessary to edit the record. Again, anomaly detection can be used to detect or prevent data errors and can have thresholds set to them as well.

Conclusion

Data quality rules can be developed to monitor the state of data within a system. Once the rules are developed it is important to determine if they are detective or preventative. The main reason for this is that the type of rule affects the urgency with which the problem needs to be addressed.

Data Profile

Leave a reply

Advertisements

One aspect of the data governance experience is data profiling. In this post we will look at what a data profile is, an example of a simple data profile, and the development of rules that are related to the data profile.

Definition

Data profiling is the process of running descriptive statistics on a dataset to develop insights about the data and field dependencies. Some questions there are commonly asked when performing a data profile includes.

How many observations are in the data set?
What are the min and max values of a column(s)?
How many observations have a particular column populated with a value (missing vs non-missing data)?
When one column is populated what other columns are populated?

Data profiling helps you to confirm what you know and do not know about your data. This knowledge will help you to determine issues with your data quality and to develop rules to assess data quality.

Student Records Table

StudentID	StudentFirstName	StudentLastName	StudentBirthDate	StudentClassLevel
1001	Maria	Smith	04/04/2000	Senior
1002		Chang	09/12/2004	Junior
1003	Francisco	Brown		Junior
1004	Matthew	Peter	01/01/2005	Freshman
1005	Martin		02/05/2002	Sophmore

The first column from the left is the student id. Looking at this column we can see that there are five records with data. That this column is numeric with 4 characters. The minimum value is 1001 and the max value is 1005.

The next two columns are first name and last name. Both of these columns are string text with a min character length of 5 and a max length of 7 for first name and 5 for last name. For both columns, 80% of the records are populated with a value. In addition, 60% of the records have a first name and a last name.

The fourth column is the birthdate. This column has populated records 80% of the time and all rows follow a MM/DD/YYYY format. The minimum value is 04/04/2000 and the max value is 01/01/2005. 40% of the rows have a first name, last name, and birthdate.

Lastly, 100% of the class-level column is populated with values. 20% of the values are senior, 40% are junior, 20% are sophomore, and 20% are freshman.

Developing Data Quality Rules

From the insights derived from the data profile, we can now develop some rules to ensure quality. With any analysis or insight the actual rules will vary from place to place based on needs and context but below are some examples for demonstration purposes.

All StudentID values must be 4 numeric characters
The Student ID values must be populated
All StudentFirstName values must be 1-10 characters in length
All StudentLastName values must be 1-10 characters in length
All StudentBirhdate values must be in MM/DD/YYYY format
All StudentClassLevel values must be Freshman, Sophomore,, Junior, or Senior

Conclusion

A data profile can be much more in-depth than the example presented here. However, if you have hundreds of tables and dozens of databases this can be quite a labor-intensive experience. There is software available to help with this but a discussion of that will have to wait for the future.

Data Quality

Leave a reply

Advertisements

Bad data leads to bad decisions. However, the question is how can you know if your data is bad. One answer to this question is the use of data quality metrics. In this post, we will look at a definition of data quality as well as metrics of data quality

Definition

Data quality is a measure of the degree that data is appropriate for its intended purpose. In other words, it is the context in which the data is used that determines if it is of high quality. For example, knowing email addresses may be appropriate in one instance but inappropriate in another instance.

When data is determined to be of high quality it helps to encourage trust in the data. Developing this trust is critical for decision-makers to have confidence in the actions they choose to take based on the data that they have. Therefore data quality is of critical importance for an organization and below are several measures of data quality.

Measuring Data Quality

Completeness is a measure of the degree to which expected columns (variables) and rows (observations) are present. There are times when data can be incomplete due to missing data and or missing variables. There can also be data that is partially completed which means that data is present in some columns but not others. There are various tools for finding this type of missing data in whatever language you are using.

Validity is a measure of how appropriate the data is in comparison to what the data is supposed to represent. For example, if there is a column in a dataset that measures the class level of high school students using Freshman, Sophmore, Junior, and Senior. Data would e invalid if it use the numerical values for the grade levels such as 9, 10, 11, and 12. This is only invalid because of the context and the assumptions that are brought to the data quality test.

Uniqueness is a measure of duplicate values. Normally, duplicate values happen along rows in structured data which indicates that the same observation appears twice or more. However, it is possible to have duplicate columns or variables in a dataset. Having duplicate variables can cause confusion and erroneous conclusions in statistical models such as regression.

Consistency is a measure of whether data is the same across all instances. For example, there are times when a dataset is refreshed overnight or whenever. The expectation is that the data should be mostly the same except for the new values. A consistency check would assess this. There are also times when thresholds are put in place such that the data can be a little different based on the parameters that are set.

Timeliness is the availability of the data. For example, if data is supposed to be ready by midnight any data that comes after this time fails the timeliness criteria. Data has to be ready when it is supposed to be. This is critical for real-time applications in which people or applications are waiting for data.

Accuracy is the correctness of the data. The main challenge of this is that there is an assumption that the ground truth is known to make the comparison. If a ground truth is available the data is compared to the truth to determine the accuracy.

Conclusion

The metrics shared here are for helping the analyst to determine the quality of their data. For each of these metrics, there are practical ways to assess them using a variety of tools. With this knowledge, you can be sure of the quality of your data.

Data Governance Solutions

Leave a reply

Advertisements

Data governance is good at indicating various problems an organization may have with data. However, finding problems doesn’t help as much as finding solutions does. This post will look at several different data governance solutions that deal with different problems.

Business Glossary

The business glossary contains standard descriptions and definitions. It also can contain business terms or discipline-specific terminology. One of the main benefits of developing a business glossary is creating a common vocabulary within the organization.

Many if not all businesses and fields of study have several different terms that mean the same thing. In addition, people can be careless with terminology, to the confusion of outsiders. Lastly, sometimes a local organization will have its own unique terminology. No matter the case the business dictionary helps everyone within an organization to communicate with one another.

An example of a term in a business dictionary might be how a school defines a student ID number. The dictionary explains what the student ID number is and provides uses of the ID number within the school.

Data Dictionary

The data dictionary provides technical information. Some of the information in the data dictionary can include the location of data, relationships between tables, values, and usage of data. One benefit of the data dictionary is that it promotes consistency and transparency concerning data.

Returning to our student ID number example, a data dictionary would share where the student ID number is stored and the characteristics of this column such as the ID number being 7 digits. For a categorical variable, the data dictionary may explain what values are contained within the variable such as male and female for gender.

Data Catalog

A data catalog is a tool for metadata management. It provides an organized inventory of data within the organization. Benefits of a data catalog include improving efficiency and transparency, quick locating of data, collaboration, and data sharing.

An example of a data catalog would be a document that contains the metadata about several different data warehouses or sources within an organization. If a data analyst is trying to figure out where data on student ID numbers are stored they may start with the data catalog to determine where this data is. The data dictionary will explain the characteristics of the student ID column. Sometimes the data dictionary and catalog can be one document if tracking the data in an organization is not too complicated. The point is that the distinction between these solutions is not obvious and is really up to the organization.

Automated Data Lineage

Data lineage describes how data moves within an organization from production to transformation and finally to loading. Tracking this process is really complicated and time-consuming and many organizations have turned to software to complete this.

The primary benefit of tracking data lineage is increasing the trust and accuracy of the data. If there are any problems in the pipeline, data lineage can help to determine where the errors are creeping into the pipeline.

Data Protection, Privacy, QUailty

Data protection is about securing the data so that it is not tampered with in an unauthorized manner. An example of data protection would be implementing access capabilities such as user roles and passwords.

Data privacy is related to protection and involves making sure that information is restricted to authorized personnel. Thus, this also requires the use of logins and passwords. In addition, classifying the privacy level of data can also help in protecting it. For example, salaries are generally highly confidential while employee work phone numbers are probably not.

Data quality involves checking the health of the accuracy and consistency of the data. Tools for completing this task can include creating KPIs and metrics to measure data quality, developing policies and standards that defined what is good data quality as determined by the organization, and developing reports that share the current quality of data.

Conclusion

The purpose of data governance is to support an organization in maintaining data that is an asset to the organization. In order for data to be an asset it must be maintained so that the insights and decisions that are made from the data are as accurate and clear as possible. The tools described in this post provide some of the ways in which data can be protected within an organization.

Data Governance Strategy

Leave a reply

Advertisements

A strategy is a plan of action. Within data governance, it makes sense to ultimately develop a strategy or plan to ensure data governance takes place. In this post, we will look at the components of a data governance strategy. Below are the common components of a data governance strategy.

Approach
Vision statement
Mission statement
Value proposition
Guiding principles
Roles & Responsibilities

There is probably no particular order in which these components are completed. However, they tend to follow an inverted pyramid in terms of the scope of what they deal with. In other words, the approach is perhaps the broadest component and affects everything below it followed by the vision statement, etc. Where to begin probably depends on how your mind works. A detail-oriented person may start at the bottom while a big-picture thinker would start at the top.

Defined Approach

The approach defines how the organization will go about data governance. There are two extremes for this and they are defensive and offensive. A defensive approach is focused on risk mitigation while an offensive approach is focused more on achieving organizational goals.

Neither approach is superior to the other and the situation an organization is in will shape which is appropriate. For example, an organization that is struggling with data breaches may choose a more defensive approach while an organization that is thriving with allegations may take a more offensive approach.

Vision Statement

A vision statement is a brief snapshot of where the organization wants to be. Another way to see this is that a vision statement is the purpose of the organization. The vision statement needs to be inspiring and easily understood. It also helps to align the policies and standards that are developed.

An example of a vision statement for data governance is found below.

Transforming how data is leveraged to make informed decisions to support youth served by this organization

The vision is to transform data for decision-making. This is an ongoing process that will continue indefinitely.

Mission Statement

The mission statement explains how an organization will strive toward its vision. Like a vision statement, the mission statement provides guidance in developing policies and standards. The mission statement should be a call to action and include some of the goals the organization has about data. Below is an example

Enabling stakeholders to make data-driven decisions by providing accurate, timely data and insights

In the example above, it is clear that accuracy, timeliness, and insights are the goals for achieving the vision statement. In addition, the audience is identified which is the stakeholders within the organization.

Value Proposition

The value proposition provides a justification or the significance of adopting a data governance strategy. Another way to look at this is an emphasis on persuasion. Some of the ideas included in the value proposition are the benefits of implementation. Often the value proposition is written in the form of cause and effect statement(s). Below is an example

By implementing this data governance program we will see the following benefits:

Improved data quality for actionable insights, increased trust in data for making decisions, and clarity of roles and responsibilities of analysts

In the example above three clear benefits are shared. Succinctly this provides people with the potential outcomes of adopting this strategy. Naturally, it would be beneficial to develop ways to measure these ideas which means that only benefits that can be measured should be a part of the value proposition.

Guiding Principles

Guiding principles define how data should be used and managed. Common principles include transparency, accountability, integrity, and collaboration. These principles are just more concrete information for shaping policies and standards. Below is an example of a guiding principle.

All data will have people assigned to play critical roles in it

The guiding principle above is focused on accountability. Making sure all data has people who are assigned to perform various responsibilities concerning it is important to define and explain.

Roles & Responsibilities

Roles and responsibilities are about explaining the function of the data governance team and the role each person will play. For example, a small organization might have people who adopt more than one role such as being data stewards and custodians while larger organizations might separate these roles.

In addition, it is also important to determine the operating model and whether it will be centralized or decentralized. Determining the operating model again depends on the context and preferences of the organization.

It is also critical to determine how compliance with the policies and standards will be measured. It is not enough to say it, eventually, there needs to be evidence in terms of progress and potential changes that need to be made to the strategy. For example, perhaps a data audit is done monthly or quarterly to assess data quality.

Conclusion

Having a data governance strategy is a crucial step in improving data governance within an organization. Once a plan is in place it is simply a matter of implementation to see if it works.

Data Governance Assessment

Leave a reply

Advertisements

Before data governance can begin at an organization it is critical to assess where the organization is currently in terms of data governance. This necessitates the need for a data governance assessment. The assessment helps an organization to figure out where to begin by identifying challenges and prioritizing what needs to be addressed. In particular, it is common for there to be five steps in this process as shown below.

Identify data sources and stakeholders
Interview stakeholders
Determine current capabilities
Document the current state and target state
Analyze gaps and prioritize

We will look at each of these steps below.

Identify Data Sources and Stakeholders

Step one involves determining what data is used within the organization and the users or stakeholders of this data. Essentially, you are trying to determine…

What data is out there?
Who uses it?
Who produces it?
Who protects it?
Who is responsible for it?

Answering these questions also provides insights into what roles in relation to data governance are already being fulfilled at least implicitly and which roles need to be added to the organization. At most organizations at least some of these questions have answers and there are people responsible for many roles. The purpose here is not only to get this information but also to make people aware of the roles they are fulfilling from a data governance perspective.

Interview Stakeholders

Step two involves interviewing stakeholders. Once it is clear who is associated with data in the organization it is time to reach out to these people. You want to develop questions to ask stakeholders in order to inform you about what issues to address in relation to data governance.

An easy way to do this is to develop questions that address the pillars of data governance. The pillars are…

Ownership & accountability
Data quality
Data protection and privacy
Data management
Data use

Below are some sample questions based on the pillars above.

How do you know your data is of high quality
What needs to be done to improve data quality
How is data protected from misuse and loss
How is metadata handle
What concerns do you have related to data
What policies are there now related to data
What roles are there in relation to data
How is data used here

It may be necessary to address all or some of these pillars when conducting the assessment. The benefit of these pillars is they provide a starting point in which you can shape your own interview questions. In terms of the interview, it is up to each organization to determine what is best for data collection. Maybe a survey works or perhaps semi-structured interviews or focus groups. The actual research part of this process is beyond the scope of this interview.

Determine Current Capabilities

Step three involves determining the current capabilities of the organization in terms of data governance. Often this can be done by looking at the stakeholder interviews and comparing what they said to a rating scale. For example, the DCAM rating scale has six levels of data governance competence as shown below.

Non-initiated-No governance happening
Conceptual-Aware of data governance and planning
Developmental-Engaged in developing a plan
Defined-PLan approved
Achieved-Plann implemented and enforced
Enhanced-Plan a part of the culture and updated regularly

Determining the current capabilities is a subjective process. However, it needs to be done in order to determine the next steps in bringing data governance along in an organization.

Document Current State and Target State

Step four involves determining the current state and determining what the target state is. Again, this will be based on what was learned in the stakeholder interviews. What you will do is report what the stakeholders said in the interviews based on the pillars of data governance. It is not necessary to use the pillars but it does provide a convenient way to organize the data without having to develop your own way of classifying the results.

Once the current state is defined it is now time to determine what the organization should be striving for in the future and this is called the target state. The target state is the direction the organization is heading within a given timeframe. It is up to the data governance team to determine this and how it is done will vary. The main point is to make sure not to try and address too many issues at once and save some for the next cycle.

Analyze and Prioritize

The final step is to analyze and prioritize. This step involves performing a gap analysis to determine solutions that will solve the issues found in the previous step. In addition, it is also important to prioritize which gaps to address first.

Another part of this step is sharing recommendations and soliciting feedback. Provide insights into which direction the organization can go to improve its data governance and allow stakeholders to provide feedback in terms of their agreement with the report. Once all this is done the report is completed and documented until the next time this process needs to take place.

Conclusion

The steps presented here are not prescriptive. They are shared as a starting point for an organization’s journey in improving data governance. With experience, each organization will find its own way to support its stakeholders in the management of data.

Data Governance Office

Leave a reply

Advertisements

The data governance office or team are the leaders in dealing with data within an organization. This team is comprised of several members such as

Chief Data Officer
Data Governance Lead
Data Governance Consultant
Data Quality Analyst

We will look at each of these below. It also needs to be mentioned that a person might be assigned several of these roles which are particularly true in a smaller organization. In addition, it is possible that several people might fulfill one of these roles in a much larger organization as well.

Chief Data Officer

The chief data officer is responsible for shaping the overall data strategy at an organization. The chief data officer also promotes a data-driven culture and pushes for change within the organization. A person in this position also needs to understand the data needs of the organization in order to further the vision of the institution or company.

The role of the chief data officer encompasses all of the other roles that will be discussed. The chief data officer is essentially the leader of the data team and provides help with governance consulting, quality, and analytics. However, the primary role of this position is to see the big picture for big data and to guide the organization in this regard, which implies that technical skills are beneficial but leadership and change promotion is more critical. In sum, this is a challenging position that requires a large amount of experience

Data Governance Lead

The data governance leads primary responsibilities to involve defining policies and data governance frameworks. While the chief data officer is more of an evangelist or promoter of data governance the data governance lead is focused on the actual implementation of change and guiding the organization in this process.

Essentially, the data governance lead is in charge of the day-to-day operation of the data governance team. While the chief data officer may be the dreamer the data governance lead is a steady hand behind the push for change.

Data Governance Consultant

The data governance consultant is the subject matter expert in data governance. Their role is to know all the details of data governance in the general field and even better if they know how to make data governance happen in a particular discipline. For example, a data governance consultant who knows how to make data governance happen within the context of a university in particular.

The data governance consultant supports the data governance lead with implementation. In addition, the consultant is a go-between for the larger organization and IT. Serving as a go-between implies that the consultant is able to effectively communicate with both parties on a technical level with IT and in a layman’s matter with the larger organization. The synergy between IT and the larger organization can be challenging because of communication issues due to vastly different backgrounds and it is the consultant’s responsibility to bridge this gap.

Data Quality Analyst

The data quality analyst’s job is as the name implies to ensure quality data. One way of determining data quality is to develop rules for data entry. For example, a rule for data quality is that marital status can only be single, married, divorced, or widowed. This rule restricts any other option that people may want. When this rule is supported it is an example of high quality within this context.

A data quality analyst also performs troubleshooting or root cause investigations. If something funny is going on in the data such as duplicates, it is the data quality analyst’s job to determine what is causing the problems and to find a solution. Lastly, a data quality analyst is also responsible for statistical work. This can include statistical work that is associated with the work of a data analyst and or statistical work that monitors the use of data and the quality of data within the organization.

Conclusion

The data governance team plays a critical role in supporting the organization with reliable and clean data that can be trusted to make actionable insights. Even though this is a tremendous challenge it is an important function in an organization.

Roles in Data Governance

Leave a reply

Advertisements

Working with data is a team event. Different people are involved in different stages of the data process. The roles described below are roles commonly involved in data governance. The general order below is the common order in which these individuals will work with data. However, life is not always linear and different people may jump in at different times. In addition, one person might have more than one role when working with data in the governance process.

Data Owners

Data owners are responsible for the infrastructure such as the database in which data is stored for consumption and use. Data owners are also in charge of the allocation of resources related to the data. Data owners also play a critical role in developing standard operating procedures and compliance with these standards.

Data Producers

Once the database or whatever tool is used for the data the next role involved is the data producer. Data producers are responsible for creating data. The creation of data can happen through such processes as data entry or data collection. Data producers may also support quality control and general problem-solving of issues related to data. To make it simple the producer uses the system that the owner developed for the data.

Data Engineers

Data engineers are responsible for pipeline development which is moving data from one place to the other for various purposes. Data engineers deal with storage optimization and distribution. Data engineers also support the automation of various tasks. Essentially, engineers move around the data that producers create.

Data Custodians

Data custodians are the keepers and protectors of data. They focus on using the storage created by the data owner and the delivery of data like the data engineer. The difference is that the data custodian sends data to the people after them in this process such as stewards and analysts.

Data custodians also make sure to secure and back up the data. Lastly, data custodians are often responsible for network management.

Data Stewards

Data stewards work on defining and organizing data. These tasks might involve working with metadata in particular. Data students also serve as gatekeepers to the data which involves keeping track of who is using and accessing the data. Lastly, data stewards help consumers (analysts and scientists) find the data that they may need to complete a project.

Data Analysts

Data analysts as the name implies analyze the data. Their job can involve statistical modeling of data to make a historical analysis of what happened in the past. Data analysts are also responsible for cleaning data for analysis. In addition, data analysts are primarily responsible for data visualization and storytelling development of data. Dashboards and reports are also frequently developed by the data analyst.

Data Scientists

The role of a data scientist is highly similar to data analyst. The main difference is that data scientists use data to predict the future while data analysts use data to explain the past. In addition, data scientists serve as research designers to acquire additional data for the goals of a project. Lastly, data scientists do advance statistical work involving at times machine learning, artificial intelligence, and data mining.

Conclusion

The roles mentioned above all play a critical role in supporting data within an organization. When everybody plays their part well organizations can have much more confidence in the decisions they make based on the data that they have.

Data Governance Framework Types and Principles

Leave a reply

Advertisements

When it is time to develop data governance policies the first thing to consider is how the team views data governance. In this post, we will look at various data governance frameworks and principles to keep in mind when employing a data governance framework.

Top-Down

The top-down framework involves a small group of data providers. These data providers serve as gatekeepers for data that is used in the institution. Whatever data is used is controlled centrally in this framework.

One obvious benefit of this approach is that with a small group of people in charge, decision-making should be fast and relatively efficient. In addition, if something does go wrong it should be easy to trace the source of the problem. However, a top-down approach only works in situations that have small amounts of data or end users. When the amount of data becomes too large the small team will struggle to support users which indicates that this approach is hard to scale. Lastly, people may resent having to abide by rules that are handed down from above.

Bottom-Up

The bottom-up approach to data governance is the mirror opposite of the top-down approach. Where top-down involves a handful of decision-makers bottom-up focus is on a democratic style of data leadership. Bottom-up is scaleable due to everyone being involved in the process while top-down does not scale well. Generally, controls and restrictions on data are put in place after the raw data is shared rather than before when the bottom-up approach is used.

Like all approaches to data governance, there are concerns with the bottom-up approach. For example, it becomes harder to control the data when people are allowed to use raw data that has not been prepared for use. In addition, because of the democratic nature of the bottom-up approach, there is also an increased risk of security concerns because of the increased freedom people have.

Collaborative

The collaborative approach is a mix of top-down and bottom-up ideas on data governance. This approach is flexible and balanced while placing an emphasis on collaboration. The collaboration can be among stakeholders or between the gatekeepers and the users of data.

One main concern with this approach is that it can become messy and difficult to execute if principles and goals are not clearly defined. There it is important to spend a large amount of time in planning when choosing this approach.

Principles

Regardless of which framework you pick when beginning data governance. There are also several terms you need to be familiar with to help you be successful. For example, integrity involves maintaining open lines of communication and the sharing of problems so that an atmosphere of trust is maintained or developed.

It is also important to determine ownership for the purpose of governance and decision-making. Determining ownership also helps to find gaps in accountability and responsibility for data.

Leaders in data governance must also be aware of change and risk management. Change management is tools and process for communicating new strategies and policies related to data governance. Change management helps with ensuring a smooth transition from one state of equilibrium to another. Risk management is tools related to auditing and developing interventions for non-compliance.

A final concept to be aware of is strategic alignment. The goals and purpose of data governance must align with the goals of the organization that data governance is supporting. For example, a school will have a strict stance on protecting student privacy. Therefore, data governance needs to reflect this and support strict privacy policies

Conclusion

Frameworks provide a foundation on which your team can shape their policies for data governance. Each framework has its strengths and weaknesses but the point is to be aware of the basic ways that you can at least begin the process of forming policies and strategies for governing data at an organization.

Data Governance Framework

Leave a reply

Advertisements

In this post we will look at a defining data governance framework. We will also look a the key components that are a part of a data governance framework.

Defined

A data governance framework is the how or the plan for governing the data within an organization. The term data governance determines what needs to be governed or controlled while the data governance framework is the actual plan for controlling the data.

Common Components

There are several common components of a data governance plan and they include the following.

Strategy
Policies
Processes
Coordination
Monitoring/communication
Data literacy/culture

Strategy involves determining how data can be used to solve problems. This may seem pointless but certain data can be used to solve certain problems. For example, customers’ addresses in California might not be appropriate for determining revenue generated in Texas. When data is looked at strategically it helps to ensure that it is viewed as an asset in many cases by those who use it.

Policies help to guide such things as decision-making and expectations concerning data. In addition, policies also help with determining responsibilities and tasks related to data management. One example of policy in action is the development of standards which are rules for best practices in order to meet a policy. A policy may be something like protecting privacy. A standard to meet this policy would be to ensure that data is encrypted and password protected.

Process and technology involve steps for monitoring the quality of data. Other topics related to process can include dealing with metadata and data management. The proper process mainly helps with efficiency in the organization.

Coordination involves the processes of working together. Coordination can involve defining the roles and responsibilities for a complex process that requires collaboration with data. In other words, coordination is developed when multiple parties are involved with a complex task.

Progress monitoring involves the development of KPIs to make sure that the performance expectations are measured and adhered to. Progress monitoring can also involve issues related to privacy, quality, and compliance. An example of progress monitoring may be requiring everyone to change their password every 90 days. At the end of the 90 days, the system will automatically make the user create a new password.

Lastly, data literacy and culture involve training and developing the skill of analyzing and or communicating data to people and others within the organization of use or consumption data. Naturally, this is an ongoing process and how it works depends on who is involved.

Conclusion

A framework is a plan for achieving a particular goal or vision. As organizations work with data, they must be diligent in making sure that the data that is used is trustworthy and protected. A data governance framework is one way in which these goals can be attained.

Data Governance Benefits

Leave a reply

Advertisements

Data governance is a critical part of many organizations today. In this post, we will look at some of the commonly found benefits of incorporating data governance into an organization.

Improved Data Quality

In theory, when data governance is implemented within an organization there should be a corresponding improvement in data quality. What is meant by improved data quality is better accuracy, consistency, and integrity. In addition, data quality can also include the completeness of the data and ensuring that the data is timely.

When data quality is high it allows end users to have greater trust in the analysis and conclusions that can be made from the data. Improved trust can also lead to an increase in confidence we sharing and or defending the decision-making process.

Risk Reduction

Data governance can also reduce risk. There are often laws that organizations have to follow concerning data governance. Common laws often include laws about privacy. When data governance is implemented and carefully enforced it can help in complying with laws and thus lower the risk of breaking laws and or facing legal consequences.

The typical organization probably does not want to deal with legal matters. As such, it is in most if not all organizations’ benefit to comply with laws through data governance. The process of abiding by laws also provides a good example to stakeholders and creates a culture of transparency.

Improved Decision-Making

Decisions are only as good as the information that they are based upon. If data is bad then it puts at risk the making of bad decisions. There is an idiom common in the data world which states “garbage in garbage out.” Therefore, it is critical that the data accurately represents what it is supposed to represent.

As mentioned earlier, good data leads to good decisions and increase confidence. It also helps with improving understanding of the context in which the data came from.

Improved Processes

Data governance can also improve various processes. For example, roles relating to data have to be clearly defined. In addition, various tasks that need to be completed must also be stipulated and clarified. Whenever steps like these are taken it can improve the speed at which things are done.

In addition, improving processes can also reduce errors. Since people know what their role is and what they need to do it is easier to spot and prevent mistakes as the data moves to the various parties that are using it.

Customer service

Data governance is also beneficial for customer service or dealing with stakeholders. When requests are made by customers or stakeholders, accurate data is critical for addressing their questions. In addition, there are situations in which customers or stakeholders can access the data themselves. For example, most customers can at least access their own personal information on a shopping website such as Amazon.

If data is not properly cared for users cannot access it or have their questions answered. This is frustrating no matter what field or industry one is working for. Therefore, data governance is important in enhancing the experience of customers and people who work in the institution

Profit Up

A natural outcome of the various points mentioned above is increased profit or decreased expenses depending on the revenue model. When efficiency goes up and or customer satisfaction goes up there is often an increase in revenue.

What can be inferred from this is that data governance is not just a set of ideas to avoid headaches but a tool that can be employed to enhance profitability in many situations.

Conclusion

Data governance is beneficial in many more ways than mentioned here. For our purposes, data governance can allow an organization to focus on making cost-efficient, sound decisions by ensuring the quality and accuracy of the data involved in the process of making conclusions.

Influences and Approaches of Data Governance

Leave a reply

Advertisements

Data governance has been around for a while. As a result of this, there have been various trends and challenges that have influenced this field. in this post, we will look at several laws that have had an impact on data governance along with various concepts that have been developed to address common concerns.

Laws

Several laws have played a critical role in influencing data governance both in the USA and internationally. For example, the Sarbanes-Oxley (SOX) Act was enacted in 2002. The SOX act was created in reaction to various accounting scandals at the time and large corporations. Among some of the requirements of this law are setting standards for financial and corporate reporting and the need for executives to verify or attest that the financial information is correct. Naturally, this requires data governance to make sure that the data is appropriate so that these requirements can be met.

There are also several laws related to privacy in particular. Focusing again on the USA there is the Health Insurance Portability and Accountability (HIPAA) which requires institutions in the medical field to protect patient data. For leaders in data, they must develop data governance policies that protect medical information.

In the state of California, there is the California Consumers Protection Act (CCPA) which allows California residents more control over how their personal data is handled by companies. The CCPA is focused much more on the collection and selling of personal data as this has become a lucrative industry in the data world.

At the international level, there is the General Data Protection Regulation (GDPR). The GDPR is a privacy law that applies to anybody who lives in the EU. What this implies is that a company in another part of the world that has customers in the EU must abide by this law as well. As such, this is one example of a local law related to data governance that can have a global impact.

Various Concepts that Support Data Governance

Data governance was around much earlier than the laws described above. However, several different concepts and strategies were developed to address transparency and privacy as explained below.

Data classification and retention deals with the level of confidentiality of the data and policies for data destruction. For example, social security numbers is a form of data that is highly confidential while the types of shoes a store sells would probably not be considered private. In addition, some data is not meant to be kept forever. For example, consumers may request their information be removed from a website such as credit card numbers. In such a situation there must be a way for this data to be removed permanently from the system.

Data management is focused on consistency and transparency. There must be a master copy of data to serve as a backup and for checking the accuracy of other copies. In addition, there must be some form of data reference management to identify and map datasets through some general identification such as zip code or state.

Lastly, metadata management deals with data that describes the data. By providing this information it is possible to search and catalog data

Conclusion

Data governance will continue to be influenced by the laws and context of the world. With new challenges will be new ways to satisfy the concerns of both lawmakers and the general public.

Data Governance

Leave a reply

Advertisements

Data governance involves several concepts that describe the characteristics and setting in which the data is found. For people in leadership positions involving data, it is critical to have some understanding of the following concepts related to data governance. These concepts are

Ownership
Quality
Protection
Use/Availability
Management

Each of these concepts plays a role in shaping the role of data within an organization.

Ownership

Data ownership is not always as obvious as it seems. One company may be using the data of a different company. It is important to identify who the data belongs to so that any rules and restrictions that the owner has about the use of the data are something that the user of the data is aware of.

Addressing details related to ownership helps to determine accountability as well. Identifying ownership can also identify who is responsible for the data because the owners will hopefully have an idea of who should be using the data. If not this is something that needs to be clarified as well.

Quality

Data quality is another self-explanatory term. Data quality is a way of determining how good the data is based on some criteria. One commonly used criterion for data quality is to determine the data’s completeness, consistency, timeliness, accuracy, and integrity.

Completeness is determining if everything that the data is supposed to capture is represented in the data set. For example, if income is one variable that needs to be in a dataset it is important to check that it is there.

Consistency is that the data that you are looking at is similar to other data in the same context. For example, student record data is probably similar regardless of the institutions. Therefore, someone with experience with student record data can tell you if the data you are looking at is consistent with other data in a similar context.

Timeliness has to do with the recency of the data. Some data is real-time while other data is historical. Therefore, the timeliness of the data will depend on the context of the project. A chatbot needs recent data while a study of incomes from ten years ago does not need data from yesterday.

Accuracy and integrity are two more measures of qualityu. Accuracy is how well the data represents the population. For example, a population of male college students should have data about male college students. Integrity has to do with the truthfulness of the data. For example, if the data was manipulated this needs to be explained.

Protection

Data protection has to do with all of the basic security concerns IT departments have to deal with today. Some examples include encryption and password protection. In addition, there may be a need to be aware of privacy concerns such as financial records or data collected from children.

There should also be awareness of disaster recovery. For example, there might be a real disaster that wipes out data or it can be an accidental deletion by someone. In either case, there should be backup copies of the data. Lastly, protection also involves controlling who has access to the data.

Use/Availability

Despite the concerns of protection, data still needs to be available to the appropriate parties and this relates to data availability. Whoever is supposed to have the data should be able to access it as needed.

The data must also be usable. The level of usability will depend on the user. For example, a data analyst should be able to handle messy data but a consumer of dashboards needs the data to be clean and ready prior to use.

Management

Data management is the implementation of the policies that are developed in the previous ideas mentioned. The data leadership team needs to develop processes and policies for ownership, quality, protection, and availability of data.

Once the policies are developed they have to actually be employed within the institution which can always be difficult as people generally want to avoid accountability and or responsibility, especially when things go wrong. In addition, change is always disliked as people gravitate towards the current norms.

Conclusion

Data governance is a critical part of institutions today given the importance of data now. IT departments need to develop policies and plans on the data in order to maintain trust in whatever conclusions are made from data.

Exit mobile version

%%footer%%

educational research techniques

Research techniques and education

Category Archives: Data Governance

De-Identification of Data

Security Models

Data Classification

Data Governance Policy

Master Data Management

Data Protection Impact Assessment

Data Privacy Ideas to Use

Data Privacy Implementation Strategies

Privacy by Design

Data Catalogs and Data Silos

Data Privacy

Defense & Offense with Data

Data Governance Methodology

Terms Related to Data Storage

Types of Data Quality Rules

Data Profile

Student Records Table

Data Quality

Data Governance Solutions

Data Governance Strategy

Data Governance Assessment

Data Governance Office

Roles in Data Governance

Data Governance Framework Types and Principles

Data Governance Framework

Data Governance Benefits

Influences and Approaches of Data Governance

Data Governance

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Student Records Table

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: