Total data quality as its name implies is a framework for improving the state of data that is used for research and reporting purposes. The dimensions that are used to assess the quality of data are measurement and representation
Measurement
Measurement is focused on the values gathered on the variable(s) of interest. When assessing measurement researchers are concerned with.
- Construct-The construct is the definition of the variable of interest. For example, income is can be defined as a person’s gross yearly salary in dollars. However, salary can also be defined as per month or as the net after taxes to show how this construct can be defined differently. The construct validity must also be determined to ensure that it is measuring what it claims to measure.
- Field-This is the place where data is collected and how it is collected. For example, our income variable can be collected from students or working adults. Where the data comes from affects the quality of the data concerning the research problem and questions. If the research questions are focused on student income then collecting income data from students ensures quality. In addition, how the data is encoded matters. All student incomes need to be in the same currency in order to make sense for comparision
- Data Values-This refers to the tools and procedures for preparing the data for analysis to ensure high-quality values within the data. Such challenges addressed are dealing with missing data, data entry errors, duplications, assumptions for various analytical approaches, and or issues between variables such as high correlations.
Representation
Representation looks at determining if the data collected comes from the population of interest. Several concerns need to be addressed when dealing with representation.
- Target population- The target population is potential participants in the study. The limitation here is determining the access of the target population. For example, studies involving children can be difficult because of ethical concerns over data collection with children. These ethical concerns limit access at times.
- Data sources- Data sources are avenues for obtaining data. It can relate to a location such as a school or to a group of people such as students among other definitions. Once access is established it is necessary to specifically determine where the data will come from.
- Missing data-Missing data isn’t just looking at what data is not complete in a dataset. Missing data is also about looking at who was left out of the data collection process. For example, if the target population is women then women should be represented in the data. In addition, missing data can also look at who is represented in the data but should not be. For example, if women are the target population then there should not be any men in the dataset.
Where measurement and representation meet is at the data analysis part of a research project. If the measurement and representation are bad it is already apparent that the data analysis will not yield useful insights. However, if the measurement and representation are perfect but the analysis is poor then you are still left without useful insights.
Conclusion
Measurement and representation are key components of data quality. Researchers need to be aware of these ideas to ensure that they are providing useful results to whatever stakeholders are involved in a study.