The quantile is the value of X such that P(X <= x) = p. In a previous article, we learned how to use the CDF to get a probability, p. We gave the value x to a function prefixed with, p. Now, we want to do the reverse. In this article, we will learn how to calculate a quantile for distributions in R.
We can start with a simple example. Let’s create a vector of random normal numbers which will be our sample. In real life, this sample could be a list of ads and their respective click counts.
We can use the quantile
function on this sample and it will gives is
the 0, 25, 50, 75 and 100 quantiles.
# Set the seed so you can replicate
set.seed(1)
sample = rnorm(100)
quantile(sample)
## 0% 25% 50% 75% 100%
## -2.2146999 -0.4942425 0.1139092 0.6915454 2.4016178
From the above, we can see that 25% of values are below -0.4942425, 50% of values are below 0.1139092 and so on.
The quantile function also has an option, na.rm
to remove NAs in your
sample. There are other ways to handle NA, but if you want to drop them,
you can use this parameter.
For example, let’s create a vector with some NAs. We get an error message when using the quantile function.
sample = c(12, 78, 18, NA, 46, 52, 100, NA)
quantile(sample)
# Error in quantile.default(df) : missing values and NaN's not allowed if 'na.rm' is FALSE
Let’s use the sample example, but pass na.rm = TRUE
to our quantile
function.
sample = c(12, 78, 18, NA, 46, 52, 100, NA)
quantile(sample, na.rm = TRUE)
## 0% 25% 50% 75% 100%
## 12.0 25.0 49.0 71.5 100.0
Now, the function returns our expected quantile values and has removed the NA values.
We saw earlier that the quantile function returns quantiles at 0, 25,
50, 75, and 100. However, sometime we would like different percentiles.
We can use the probs
function to have the quantile method calculate
these.
# Set the seed so you can replicate
set.seed(1)
sample = rnorm(100)
quantile(sample, probs = c(.20, .70))
## 20% 70%
## -0.6138692 0.5812173
Here we can see that 20% of value are below -0.6138692 and 70% of values are below 0.5812173.
You may have noticed that the quantile function returns a named list. We
can use the unname
function to return just the numeric values. This is
a common task when using the quantile
function.
# Set the seed so you can replicate
set.seed(1)
sample = rnorm(100)
qs = quantile(sample, probs = c(.20, .70))
unname(qs)
## [1] -0.6138692 0.5812173