Proportions are are a fraction or “portion” of a total amount. For example, if there are ten men and ten women in a room the proportion of men in the room is 50% (5 / 10). There are times when doing an analysis that you want to evaluate proportions in our data rather than individual measurements of mean, correlation, standard deviation etc.
In this post we will learn how to do a test of proportions using R. We will use the dataset “Default” which is found in the “ISLR” pacakage. We will compare the proportion of those who are students in the dataset to a theoretical value. We will calculate the results using the z-test and the binomial exact test. Below is some initial code to get started.
We first need to determine the actual number of students that are in the sample. This is calculated below using the “table” function.
## ## No Yes ## 7056 2944
We have 2944 students in the sample and 7056 people who are not students. We now need to determine how many people are in the sample. If we sum the results from the table below is the code.
##  10000
There are 10000 people in the sample. To determine the proprtion of students we take the number 2944 / 10000 which equals 29.44 or 29.44%. Below is the code to calculate this
table(Default$student) / sum(table(Default$student))
## ## No Yes ## 0.7056 0.2944
The proportion test is used to compare a particular value with a theoretical value. For our example, the particular value we have is 29.44% of the people were students. We want to compare this value with a theoretical value of 50%. Before we do so it is better to state specificallt what are hypotheses are. NULL = The value of 29.44% of the sample being students is the same as 50% found in the population ALTERNATIVE = The value of 29.44% of the sample being students is NOT the same as 50% found in the population.
Below is the code to complete the z-test.
prop.test(2944,n = 10000, p = 0.5, alternative = "two.sided", correct = FALSE)
## ## 1-sample proportions test without continuity correction ## ## data: 2944 out of 10000, null probability 0.5 ## X-squared = 1690.9, df = 1, p-value < 2.2e-16 ## alternative hypothesis: true p is not equal to 0.5 ## 95 percent confidence interval: ## 0.2855473 0.3034106 ## sample estimates: ## p ## 0.2944
Here is what the code means. 1. prop.test is the function used 2. The first value of 2944 is the total number of students in the sample 3. n = is the sample size 4. p= 0.5 is the theoretical proportion 5. alternative =“two.sided” means we want a two-tail test 6. correct = FALSE means we do not want a correction applied to the z-test. This is useful for small sample sizes but not for our sample of 10000
The p-value is essentially zero. This means that we reject the null hypothesis and conclude that the proprtion of students in our sample is different from a theortical proprition of 50% in the population.
Below is the same analysis using the binomial exact test.
binom.test(2944, n = 10000, p = 0.5)
## ## Exact binomial test ## ## data: 2944 and 10000 ## number of successes = 2944, number of trials = 10000, p-value < ## 2.2e-16 ## alternative hypothesis: true probability of success is not equal to 0.5 ## 95 percent confidence interval: ## 0.2854779 0.3034419 ## sample estimates: ## probability of success ## 0.2944
The results are the same. Whether to use the “prop.test”” or “binom.test” is a major argument among statisticians. The purpose here was to provide an example of the use of both