How to Create a Factor or Categorical Variable in R


In data science, you often have cateogires or factor variables. For example, you may have t-shirt size which has the options small, medium, large. You can tell R this variable is a factor and many special properties will be added to it. In this article, we will learn how to create a factor or categorical variable in R.

To create a factor in R, we can pass a vector to the factor function. In our first example, R will automatically select the distinct levels (options).

sizes  = factor(c("Small", "Small", "Medium", "Large", "Small", "Medium"))
#> [1] Small  Small  Medium Large Small  Medium
#> Levels: Small Medium Large

We can also tell R which levels to use if not all levels are in the example vector.

sample = c("Small", "Medium")
levels = c("Small", "Medium", "Large")
sizes  = factor(sample, levels=levels)
#> [1] Small Medium
#> Levels: Small Medium Large

R will attempt to create factors when loading data, but if it doesn't not recognize the factor, you will need to call this function.

Later, when building models or graphing, you will see the usefulness of this feature. It will easily allow us to group values and count by the different levels. For example, if we had a list of t-shirt sizes sold, we can easily count by the different levels and build a graph.