Developing research questions is an absolute necessity in completing any research project. The questions you ask help to shape the type of analysis that you need to conduct.
The type of questions you ask in the context of analytics and data science are similar to those found in traditional quantitative research. Yet data science, like any other field, has its own distinct traits.
In this post, we will look at six different types of questions that are used frequently in the context of the field of data science. The six questions are…
Understanding the types of question that can be asked will help anyone involved in data science to determine what exactly it is that they want to know.
A descriptive question seeks to describe a characteristic of the dataset. For example, if I collect the GPA of 100 university student I may want to what the average GPA of the students is. Seeking the average is one example of a descriptive question.
With descriptive questions, there is no need for a hypothesis as you are not trying to infer, establish a relationship, or generalize to a broader context. You simply want to know a trait of the dataset.
Exploratory questions seek to identify things that may be “interesting” in the dataset. Examples of things that may be interesting include trends, patterns, and or relationships among variables.
Exploratory questions generate hypotheses. This means that they lead to something that may be more formal questioned and tested. For example, if you have GPA and hours of sleep for university students. You may explore the potential that there is a relationship between these two variables.
Inferential questions are an extension of exploratory questions. What this means is that the exploratory question is formally tested by developing an inferential question. Often, the difference between an exploratory and inferential question is the following
- Exploratory questions are usually developed first
- Exploratory questions generate inferential questions
- Inferential questions are tested often on a different dataset from exploratory questions
In our example, if we find a relationship between GPA and sleep in our dataset. We may test this relationship in a different, perhaps larger dataset. If the relationship holds we can then generalize this to the population of the study.
Causal questions address if a change in one variable directly affects another. In analytics, A/B testing is one form of data collection that can be used to develop causal questions. For example, we may develop two version of a website and see which one generates more sales.
In this example, the type of website is the independent variable and sales is the dependent variable. By controlling the type of website people see we can see if this affects sales.
Mechanistic questions deal with how one variable affects another. This is different from causal questions that focus on if one variable affects another. Continuing with the website example, we may take a closer look at the two different websites and see what it was about them that made one more succesful in generating sales. It may be that one had more banners than another or fewer pictures. Perhaps there were different products offered on the home page.
All of these different features, of course, require data that helps to explain what is happening. This leads to an important point that the questions that can be asked are limited by the available data. You can’t answer a question that does not contain data that may answer it.
Answering questions is essential what research is about. In order to do this, you have to know what your questions are. This information will help you to decide on the analysis you wish to conduct. Familiarity with the types of research questions that are common in data science can help you to approach and complete analysis much faster than when this is unclear