In research, the term ‘model’ is employed frequently. Normally, a model is some sort of a description or explanation of a real world phenomenon. In data science, we employ the use of statistical models. Statistical models used numbers to help us to understand something that happens in the real world.
A statistical model used numbers to help us to understand something that happens in the real world.
In the real world, quantitative research relies on numeric observations of some phenomenon, behavior, and or perception. For example, let’s say we have the quiz results of 20 students as show below.
32 60 95 15 43 22 45 14 48 98 79 97 49 63 50 11 26 52 39 97
This is great information but what if we want to go beyond how these students did and try to understand how students in the population would do on the quizzes. Doing this requires the development of a model.
A model is simply trying to describe how the data is generated in terms of whatever we are mesuring while allowing for randomness. It helps in summarizing a large collection of numbers while also providing structure to it.
One commonly used model is the normal model. This model is the famous bell-curve model that most of us are familiar with. To calculate this model we need to calculate the mean and standard deviation to get a plot similar to the one below
Now, this model is not completely perfect. For example, a student cannot normally get a score above 100 or below 0 on a quiz. Despite this weakness, normal distribution gives is an indication of what the population looks like.
With this, we can also calculate the probability of getting a specific score on the quiz. For example, if we want to calculate the probability that a student would get a score of 70 or higher we can do a simple test and find that it is about 26%.
The normal model is not the only model. There are many different models to match different types of data. There are the gamma, student t, binomial, chi-square, etc. To determine which model to use requires examining the distribution of your data and match it to an appropriate model.
Another option is to transform the data. This is normally done to make data conform to a normal distribution. Which transformation to employ depends on how the data looks when it is plotted.
Modeling helps to bring order to data that has been collected for analysis. By using a model such as the normal distribution, you can begin to make inferences about what the population is like. This allows you to take a handful of data to better understand the world.