How to Test Independence Categorical Variables in R

05.07.2021

Let's say we have a survey of students stating if they liked a class and if they passed or failed the class. Now, we want to test if these two variables are independent or not. That is to say, we want to know if the students favorability of the class depends on the grade and vise versa. In this article, we will learn how to test the independence of categorical variables.

Let's start by simulating the survey. We will have 20 students saying if the enjoyed the class or not and if they passed or failed.

enj.sample = sample(c("Yes", "No"), 20, replace = TRUE)
enjoyed = factor(enj.sample)


pass.sample = sample(c("Pass", "Fail"), 20, replace = TRUE)
passed = factor(pass.sample)

To test for independence, we can use the chi-squared test. To this this in R, we can first use the table method to create a contingency table, then use the summary method. This will gived us a chi-squared test and a p-value.

cont.table = table(enjoyed, passed)
summary(cont.table)

# Number of cases in table: 20 
# Number of factors: 2 
# Test for independence of all factors:
# 	Chisq = 0.03472, df = 1, p-value = 0.8522
# 	Chi-squared approximation may be incorrect

From the output we see that the p-value is .8522 which means we fail to reject the null hypothesis and the two factors are independent. This makes sense as we randomly generated the data set.