Let's say we have a survey of students stating if they liked a class and if they passed or failed the class. Now, we want to test if these two variables are independent or not. That is to say, we want to know if the students favorability of the class depends on the grade and vise versa. In this article, we will learn how to test the independence of categorical variables.
Let's start by simulating the survey. We will have 20 students saying if the enjoyed the class or not and if they passed or failed.
enj.sample = sample(c("Yes", "No"), 20, replace = TRUE) enjoyed = factor(enj.sample) pass.sample = sample(c("Pass", "Fail"), 20, replace = TRUE) passed = factor(pass.sample)
To test for independence, we can use the
chi-squared test. To this this in R, we can first use the
table method to create a contingency table, then use the
summary method. This will gived us a chi-squared test and a p-value.
cont.table = table(enjoyed, passed) summary(cont.table) # Number of cases in table: 20 # Number of factors: 2 # Test for independence of all factors: # Chisq = 0.03472, df = 1, p-value = 0.8522 # Chi-squared approximation may be incorrect
From the output we see that the p-value is
.8522 which means we fail to reject the null hypothesis and the two factors are independent. This makes sense as we randomly generated the data set.