During analysis, it is often required to test a sample proportion to a theoretical or known proportion to see if there is a change. For example, let’s say we conduct a survey at the end of a course every semester to see if students enjoyed the class. We may have know that over the past 5 years we have had 73% of students say they enjoyed the class. We can then use this information to conduct a proportion test on the next semester and see if the enjoyment has changed. In this article, we will learn how to conduct a proportion test in R.
For our tutorial, let’s create some fake data. Below we create a sample of 40 students who say “yes” when they enjoyed the class and “no” when they disliked the class.
set.seed(1)
options = c("yes", "no")
samp = sample(options, size = 40, replace = TRUE)
str(samp)
## chr [1:40] "yes" "no" "yes" "yes" "no" "yes" "yes" "yes" "no" "no" "yes" ...
We start with a two-sided z-test. This test will tell us whether or not our sample proportion is equal to our theoretical proportion. They null hypothesis is that they are equal.
We need a few paces of information. First, we need the proportion from
our sample, we then need the size of our sample (n), and the theoretical
proportion (p). We pass these all to the prop.test
function and we can
see the result below.
p = .73 # We will assume we have this from previous years
n = length(samp)
yes.answers = samp[samp == 'yes'] # Get all the yes's
prop.test(
length(yes.answers),
n = n,
p = p,
alternative = "two.sided",
correct = FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: length(yes.answers) out of n, null probability p
## X-squared = 3.4297, df = 1, p-value = 0.06403
## alternative hypothesis: true p is not equal to 0.73
## 95 percent confidence interval:
## 0.4459589 0.7365167
## sample estimates:
## p
## 0.6
From the test above, we can see a p-value of p-value = 0.06403
. Which
would fail to reject the null hypothesis at the .05 or .01 levels. Thus,
we do not have evidence to say that our proportion is significantly
different.
Let’s briefly look at two more examples. Above we tested if the sample proportion was not equal to our theoretical proportion. Below, we will test is the sample proportion is less or greater than the theoretical proportion.
prop.test(
length(yes.answers),
n = n,
p = p,
alternative = "less",
correct = FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: length(yes.answers) out of n, null probability p
## X-squared = 3.4297, df = 1, p-value = 0.03202
## alternative hypothesis: true p is less than 0.73
## 95 percent confidence interval:
## 0.0000000 0.7171352
## sample estimates:
## p
## 0.6
prop.test(
length(yes.answers),
n = n,
p = p,
alternative = "greater",
correct = FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: length(yes.answers) out of n, null probability p
## X-squared = 3.4297, df = 1, p-value = 0.968
## alternative hypothesis: true p is greater than 0.73
## 95 percent confidence interval:
## 0.4701942 1.0000000
## sample estimates:
## p
## 0.6
Another proportion test we can conduct is a Binomial Exact Test. R also
has a method called the binom.test
to allow us to conduct these. Let’s
look at some examples using the sample problem above. We will look at
similar hypothesis, equal, less, and greater than.
binom.test(length(yes.answers), n = n, p = p)
##
## Exact binomial test
##
## data: length(yes.answers) and n
## number of successes = 24, number of trials = 40, p-value = 0.07442
## alternative hypothesis: true probability of success is not equal to 0.73
## 95 percent confidence interval:
## 0.4332671 0.7513500
## sample estimates:
## probability of success
## 0.6
binom.test(length(yes.answers), n = n, p = p, alternative = "less")
##
## Exact binomial test
##
## data: length(yes.answers) and n
## number of successes = 24, number of trials = 40, p-value = 0.05092
## alternative hypothesis: true probability of success is less than 0.73
## 95 percent confidence interval:
## 0.0000000 0.7305962
## sample estimates:
## probability of success
## 0.6
binom.test(length(yes.answers), n = n, p = p, alternative = "greater")
##
## Exact binomial test
##
## data: length(yes.answers) and n
## number of successes = 24, number of trials = 40, p-value = 0.9754
## alternative hypothesis: true probability of success is greater than 0.73
## 95 percent confidence interval:
## 0.4577833 1.0000000
## sample estimates:
## probability of success
## 0.6