Comparing groups is a common goal in statistics. This is done to see if there is a difference between two groups. Understanding the difference can lead to insights based on statistical results. In this post, we will examine the following statistical test for comparing samples.
- t-test/Wilcoxon test
- Paired t-test
T-test & Wilcoxon Test
The T-test indicates if there is a significant statistical difference between two groups. This is useful if you know what the difference between the two groups are. For example, if you are measuring height of men and women, if you find that men are taller through a t-test, you can state that gender influences height because the only difference between men and women in this example is their gender.
Below is an example of conducting a t-test in R. In the example, we are looking at if there is a difference in body temperature between beavers who are active versus beavers who are not.
> t.test(temp ~ activ, data = beaver2) Welch Two Sample t-test data: temp by activ t = -18.548, df = 80.852, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.8927106 -0.7197342 sample estimates: mean in group 0 mean in group 1 37.09684 37.90306
Here is what happened
- We use the ‘t.test’ function
- Within the ‘t.test’ function we indicate that we want to if there is a difference in ‘temp’ when compared on the factor variable ‘activ’ in the data ‘beaver2’
- The output provides a lot of information. The t-stat is -18.58 any number at + 1.96 indicates statistical difference.
- Df stands for degrees of freeedom and is used to determine the t-stat and p-value.
- The p-value is basically zero. Anything less than 0.05 is considered statistically significant.
- Next, we have the 95% confidence interval, which is the range of the difference of the means of the two groups in the population.
- Lastly, we have the means of each group. Group 0, the inactive group. had a mean of 37.09684. Group 1. the active group, has a mean of 37.90306.
T-test assumes that the data is normally distributed. When normality is a problem, it is possible to use the Wilcoxon test instead. Below is the script for the Wilcoxon Test using the same example.
> wilcox.test(temp ~ activ, data = beaver2) Wilcoxon rank sum test with continuity correction data: temp by activ W = 15, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0
A closer look at the output indicates the same results for the most part. Instead of the t-stat the W-stat is used but the p value is the same for both test.
A paired t-test is used when you want to compare how the same group of people respond to different interventions. For example, you might use this for a before and after experiment. We will use the ‘sleep’ data in R to compare a group of people when they receive different types of sleep medication. The script is below
> t.test(extra ~ group, data = sleep, paired = TRUE) Paired t-test data: extra by group t = -4.0621, df = 9, p-value = 0.002833 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.4598858 -0.7001142 sample estimates: mean of the differences -1.58
Here is what happened
- We used the ‘t.test’ function and indicate we want to see if ‘extra’ (amount of sleep) is influenced by ‘group’ (two types of sleep medication.
- We add the new argument of ‘paired = TRUE’ this tells R that this is a paired test.
- The output is the same information as in the regular t.test. The only differences is at the bottom where R only tells you the difference between the two groups and not the means of each. For this example, the people slept about 1 hour and 30 minutes longer on the second sleep medication when compared to the first.
Comparing samples in R is a simple process of understanding what you want to do. With this knowledge, the script and output are not too challenge even for many beginners