How to Calculate Quantiles for Distributions in R

05.15.2021

Intro

The quantile is the value of X such that P(X <= x) = p. In a previous article, we learned how to use the CDF to get a probability, p. We gave the value x to a function prefixed with, p. Now, we want to do the reverse. In this article, we will learn how to calculate a quantile for distributions in R.

Basic Quantile

We can start with a simple example. Let’s create a vector of random normal numbers which will be our sample. In real life, this sample could be a list of ads and their respective click counts.

We can use the quantile function on this sample and it will gives is the 0, 25, 50, 75 and 100 quantiles.

# Set the seed so you can replicate
set.seed(1)

sample = rnorm(100)
quantile(sample)
##         0%        25%        50%        75%       100% 
## -2.2146999 -0.4942425  0.1139092  0.6915454  2.4016178

From the above, we can see that 25% of values are below -0.4942425, 50% of values are below 0.1139092 and so on.

Quantiles When You Have Missing Data

The quantile function also has an option, na.rm to remove NAs in your sample. There are other ways to handle NA, but if you want to drop them, you can use this parameter.

For example, let’s create a vector with some NAs. We get an error message when using the quantile function.

sample = c(12, 78, 18, NA, 46, 52, 100, NA)
quantile(sample)

# Error in quantile.default(df) : missing values and NaN's not allowed if 'na.rm' is FALSE

Let’s use the sample example, but pass na.rm = TRUE to our quantile function.

sample = c(12, 78, 18, NA, 46, 52, 100, NA)
quantile(sample, na.rm = TRUE)
##    0%   25%   50%   75%  100% 
##  12.0  25.0  49.0  71.5 100.0

Now, the function returns our expected quantile values and has removed the NA values.

Probs parameters

We saw earlier that the quantile function returns quantiles at 0, 25, 50, 75, and 100. However, sometime we would like different percentiles. We can use the probs function to have the quantile method calculate these.

# Set the seed so you can replicate
set.seed(1)

sample = rnorm(100)

quantile(sample, probs = c(.20, .70))
##        20%        70% 
## -0.6138692  0.5812173

Here we can see that 20% of value are below -0.6138692 and 70% of values are below 0.5812173.

Using Unname to Retrieve the Vaues

You may have noticed that the quantile function returns a named list. We can use the unname function to return just the numeric values. This is a common task when using the quantile function.

# Set the seed so you can replicate
set.seed(1)

sample = rnorm(100)

qs = quantile(sample, probs = c(.20, .70))
unname(qs)
## [1] -0.6138692  0.5812173