A density plot allows for us to view the distribution of continous variables. This gives us an idea of the distribution of the variable matches one we recognize or if we want to transform the distribution to match. In this article, we will learn how to create a desntiy plot in ggplot2 and in R.
If you don’t have time to read, here is a quick code snippet to use in your project. For others who want details, read on.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data(starwars, package = 'dplyr')
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density()
For our tutorial, we will use the starwars
data set from the dplyr
pacakge.
library(tidyverse)
data(starwars, package = 'dplyr')
glimpse(starwars)
## Rows: 87
## Columns: 14
## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or~
## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2~
## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.~
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N~
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "~
## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",~
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, ~
## $ sex <chr> "male", "none", "none", "male", "female", "male", "female",~
## $ gender <chr> "masculine", "masculine", "masculine", "masculine", "femini~
## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T~
## $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma~
## $ films <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return~
## $ vehicles <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp~
## $ starships <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",~
To create a box plot in ggplot2, we can use the geom_density
method
after supplying a continuous variable to the y of our aes
, aesthetic.
In this example, we will use height from the starwars data set above.
library(ggplot2)
ggplot(starwars, aes(x = height)) +
geom_density()
## Warning: Removed 6 rows containing non-finite values (stat_density).
We can customize our density plots using some parameters on the
geom_boxplot
method. For example, we can change the color using the
color
named parameter. Here is an example.
ggplot(starwars, aes(x = height)) +
geom_density(color = 4,
lwd = 1,
linetype = 1)
## Warning: Removed 6 rows containing non-finite values (stat_density).
ggplot(starwars, aes(x = height)) +
geom_density(color = 4,
fill = 4,
alpha = 0.25)
## Warning: Removed 6 rows containing non-finite values (stat_density).
Density plots allow you to customize the kernal. We can do this using
the kernal
parameter in the geom_density
method.
ggplot(starwars, aes(x = height)) +
geom_density(kernel = "rectangular")
## Warning: Removed 6 rows containing non-finite values (stat_density).
We can adjust the title, x-label, and y-label of our box plot using the
labs
method. We then pass the title
, x
and y
parameters.
ggplot(starwars, aes(x = height)) +
geom_density() +
labs(
title = "Height of Star Wars Characters",
x = "X",
y = "Height"
)
## Warning: Removed 6 rows containing non-finite values (stat_density).
We can color the separate groups of our density plots by using the
fill
or colour
aesthetic properties. Here is an example of using the
fill
to assign colors to each factor.
library(ggplot2)
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density()
## Warning: Removed 6 rows containing non-finite values (stat_density).
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
If we prefer to have separate plots, we can use the facet_
methods in
ggplot. For example, here are plots separated by each cut.
library(ggplot2)
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density() +
facet_grid(~sex)
## Warning: Removed 6 rows containing non-finite values (stat_density).
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
If we would like to limit the y values of our plots, we can use the
ylimit
function.
ggplot(starwars, aes(x = height)) +
geom_density() +
xlim(125, 200) +
ylim(0, .15)
## Warning: Removed 25 rows containing non-finite values (stat_density).
We can also scale the y axis using the scale_
function from ggplot.
Here are some example of a log10 and sqrt scale of the y axis.
ggplot(starwars, aes(x = height)) +
geom_density() +
scale_x_log10() +
scale_y_sqrt()
## Warning: Removed 6 rows containing non-finite values (stat_density).
There are many color options in ggplot. We can use scale_
methods like
scale_fill_brewer()
to have ggplot automatically assign different
themes based on our data set.
library(ggplot2)
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density() +
scale_fill_brewer()
## Warning: Removed 6 rows containing non-finite values (stat_density).
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
When we have groups, ggplot will add a legend to the plot. We can
customize the position of this legend using the theme
method and the
legend.position
parameter. Here are example of moving the legend to
the top, bottom, and hiding it.
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density() +
theme(legend.position="top")
## Warning: Removed 6 rows containing non-finite values (stat_density).
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density() +
theme(legend.position="bottom")
## Warning: Removed 6 rows containing non-finite values (stat_density).
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density() +
theme(legend.position="none")
## Warning: Removed 6 rows containing non-finite values (stat_density).
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
If we want to use built in styles for the full plot, ggplot provides
themes to add to our plot. Here is an example of adding the
theme_classic
to our plot.
ggplot(starwars, aes(x = height, colour = sex, fill = sex)) +
geom_density() +
theme_classic()
## Warning: Removed 6 rows containing non-finite values (stat_density).
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf