A table of contents can be useful when having to navigate a large document. In the video below, you will learn how to create a table of contents when using R Markdown

A table of contents can be useful when having to navigate a large document. In the video below, you will learn how to create a table of contents when using R Markdown

Privacy by Design is an idea found within the General Data Protection Regulation, which affects the data privacy practices of organizations. In this post, we will define this term and explain several principles of privacy by design.
Definition
Privacy by design is a concept in which data protection happens through the appropriate development of technology. Essentially, data protection should not be limited to one place or one feature instead data protection should be layered throughout the system of an organization.

There are several ways to begin this initiative. A common method is to have a privacy policy that is up-to-date and readable. Another way to begin this process is to establish someone as the data protection officer. Lastly, it is also common to conduct some sort of assessment of data protection to determine areas of improvement before using an individual’s personal data.
Principles
There are seven principles of privacy by design. Below is a list with explanations.
Implementation Perspective
There are several perspectives from which the implementation of privacy by design that must be considered and these are systems, processes, and risk management perspectives.
The system perspective involves documenting the organization’s commitment to data protection, appointing a data protection officer or leader, providing training for employees, checking security measures, developing a record-keeping system, and conducting a self-assessment. All of these steps are used to develop an initial system for data privacy.
For processes, it is necessary to determine roles within privacy such as people in IT, legal, etc. who support privacy with their technical expertise. It is also important to document the data processing process and privacy risks. Privacy controls for users and the implementation of security measures from the systems perspective are critical as well.
Risk management is another key perspective that needs to be addressed for data privacy. Risk management involves the legal purpose of processing data. It also includes tracking who has access to data, controls for accessing data, what to do in the event of a breach, and minimization, anonymization, and pseudonymization of data. Lastly, measures for data accuracy are developed here.
Data catalogs and data silos are two ideas that are commonly associated with data governance. In this post, we will look at these two terms by defining them and share either how to implement them or prevent them.
Data Catalogs
Data catalogs are a rather recent phenomenon. They were first developed in the 2010’s with the exact origins not defined. A data catalog is a reference application that contains metadata on the various datasets within an organization. Usually, this document is in a searchable format so that people can find datasets they may need within an organization.
The data catalog essentially tracks available data within an organization. The main reason for tracking data is to prevent loss and or secret data. Within a data governance framework, data is considered an asset. Therefore, just as an organization prevents the loss of inventory because of its monetary potential the data catalog prevents the monetary decision-making loss of data within an organization.
Tips
There are also several tips for developing and using data catalogs. For example, a data catalog should track the roles of various people concerning individual datasets. Roles can include who is the owner of the data, the steward, the custodian, etc. Tracking roles helps in assigning responsibility for data.
Another tip is to develop data dictionaries concerning the data catalog. Data dictionaries contain metadata not from all data but just from one dataset. An analogy would be maps. Some maps cover the whole world like a data catalog while other maps only cover a city or county like a data dictionary. The data dictionary is useful one an analyst needs more information when preparing to use data.
It is also important to make the data catalog user-friendly. Making a data catalog user-friendly for stakeholders involves the support of IT with a strong concern for the user experience. Nobody will use a data catalog if its user interface is useless. However, the solution to this would be lots and lots of training
Data Silos
Data catalogs help to prevent what are called data silos. Data silos are sources of data that are controlled in an isolated place within an organization. When silos are developed it can lead to analyses that are incomplete because of incomplete data. In multiplication, silos can lead to a breakdown in collaboration which can cause duplication of efforts and reduced productivity. Lastly, people may also struggle within an organization to find data that is needed for analysis.
Data silos are often developed in organizations that have a decentralized IT strategy. A decentralized approach frequently leads to every department doing what they want in terms of data storage and technology utilization which is chaotic. Other motivations for data silos can include a lack of common goals when it comes to data management. No goals means everyone does what they want.
Breaking Silos
Two main ways of breaking data silos are the development of data governance and data integration. One step in data governance is developing a data catalog as mentioned early. Once a data catalog is developed the team can start to create policies and standards in data governance to establish expectations regarding data use and storage.
A second strategy that is related to the first is data integration. Data integration is the processing of combining data from different tables into one. Upon completing this more analysis can take place. Combining data makes it hard to isolate because data must be available for use.
Conclusion
Data catalogs and silos are a part of the daily life of the information professional. Therefore, in the context of data governance, it is important to be familiar with these two terms so that support can be provided.
In the video below, we will learn more about R Markdown. In particular, we will learn how to utilize tables along with ways to format text.

In the video below we will look at various ways to use R Markdown to report your results from RStudio.

A field closely related to data governance is data privacy. In this post, we will look at what data privacy is as well as principles that need to be kept in mind when trying to keep people’s data private.
Data Privacy
Privacy is a term that is difficult to define. For our purposes, data privacy is the amount of control a person has over personal information in terms of how this information is collected, managed, and stored. This definition gives the impression that people have little data privacy because we are so often compelled to share our information online.
Websites often require some surrendering of personally identifiable information (PII) such as name, address, phone number, etc while in the medical field, there is demand for personal health information (PHI). Sharing information about yourself can be frustrating for many but is the cost of doing business online. Naturally, once these various online companies have your data they must be sure to protect it.
Data security is not about collecting or managing data. Rather, data security is focused on the protection of data from unauthorized access. Securing data is critical to protect individuals and organizations from harm because of security breaches. For example, there can be serious financial repercussions if someone’s credit card number is stolen online.
Fair Information Practice Principles
With all the concerns regarding data privacy, it was natural that frameworks would be developed to help organizations with data privacy. One such framework is the Fair Information Practice Principles (FIPPs) developed by the Organization of Economic Development back in the early 1980s. Below are the eight principles in this framework.
The principles shared above have been adopted by many organizations to provide a foundation on which they can develop their own data privacy policies and philosophy.
Conclusion
Data privacy is a major concern in the world today. Organizations whether online or offline continue to demand more information about their customers. As such, this implies that there must be safeguards in place to ensure the protection of this information.
R Markdown is a tool within RStudio that is beneficial for reporting results from an analysis that was done in RStudio. The video below provides several basics ways that R Markdown can be used in a document.

Within the field of data governance, there are different ways of approaching data and the definition of truth. In this post, we will look at different approaches to data and also how truth can be defined with a data governance framework.
Defense
A defense approach to data is focused on controlling data. This can involve security and stringent governance of data through a highly centralized setting. In addition, the defensive data approach is concerned with minimizing risk and ensuring compliance with standards and expectations. Preventing theft and tracking the flow of data through an organization is also important.

When analytics are used they are used to detect fraud and unusual activity. How defensive an organization is depends on the field or industry. For example, banking and health care are highly defensive due to the type of data they gather.
Offense
An offensive approach to data is focused on developing insights with data. The goal is not to protect but to develop insights for decision-making. An offensive approach to data is characterized by flexibility and being focused on the customer. This style of approaching data is generally emphasizing a decentralized style of data governance.
Organizations that find themselves in highly competitive environments often are forced to become more offensive as they search for insights to maximize profits. How much offensive and defensive an organization needs does vary. However, in general, most if not all organizations start defensive and slowly become more offensive in nature.
Truth
Whether the approach to data is offensive or defensive it is important to determine what is the truth when it comes to data in an organization. Every organization needs a single source of truth (SSOT) for critical data. The SSOT is language used within data that is the same across an organization. For example, sometimes the same name can be entered in multiple different ways in an organization’s data. Take the company AT&T as an example it could be entered in some of the following ways
ATT
att
Att
AT and T
AT&T
Each of the examples above can be considered different and can lead to chaos when it is time to analyze data for insights. This is because redundant names can lead to redundant costs. For example, if AT&T was a vendor for our fictitious company there might be several different contracts with AT&T with several different divisions who all spell AT&T differently. To prevent this the SSOT will define the one way to code AT&T into the system and determine what it represents.
However, keeping the offensive approach to data in mind. There are times for the purpose of analysis that the SSOT can be modified. Doing this leads to what is called multiple versions of truth (MVOT). An example of MVOT is a department that classifies our example of AT&T different way from the SSOT. Accounting might see AT&T as a vendor while marketing might see AT&T as their internet provider, etc. Since everyone knows what the SSOT is they are aware when they make a MVOT for their distinct purpose.
Conclusion
Each organization needs to decide for themselves what approach to data they want to take. There is no right or wrong way to approach data it really depends on the situation. In addition, every organization needs to determine for itself how they will define truth and there is no single way to do this either. What organizations need to do is address these two topics in a way that is satisfying for them.