The select method let’s you easily select columns from your data set. There are many helpful operators and select helpers to get what you need. In this article, we will learn how to use the dplyr select method.
If you don’t have time to read, here is a quick code snippet for you.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
mtcars %>% select(mpg, cyl, hp)
## mpg cyl hp
## Mazda RX4 21.0 6 110
## Mazda RX4 Wag 21.0 6 110
## Datsun 710 22.8 4 93
## Hornet 4 Drive 21.4 6 110
## Hornet Sportabout 18.7 8 175
## Valiant 18.1 6 105
## Duster 360 14.3 8 245
## Merc 240D 24.4 4 62
## Merc 230 22.8 4 95
## Merc 280 19.2 6 123
## Merc 280C 17.8 6 123
## Merc 450SE 16.4 8 180
## Merc 450SL 17.3 8 180
## Merc 450SLC 15.2 8 180
## Cadillac Fleetwood 10.4 8 205
## Lincoln Continental 10.4 8 215
## Chrysler Imperial 14.7 8 230
## Fiat 128 32.4 4 66
## Honda Civic 30.4 4 52
## Toyota Corolla 33.9 4 65
## Toyota Corona 21.5 4 97
## Dodge Challenger 15.5 8 150
## AMC Javelin 15.2 8 150
## Camaro Z28 13.3 8 245
## Pontiac Firebird 19.2 8 175
## Fiat X1-9 27.3 4 66
## Porsche 914-2 26.0 4 91
## Lotus Europa 30.4 4 113
## Ford Pantera L 15.8 8 264
## Ferrari Dino 19.7 6 175
## Maserati Bora 15.0 8 335
## Volvo 142E 21.4 4 109
We can load the dplyr package directly, but I recommend loading the
tidyverse
package as we will use some other features in side.
library(tidyverse)
For this tutorial, we will use the mtcars
data set the comes with
tidyverse
. We take a look at this data set below.
data(mtcars)
glimpse(mtcars)
## Rows: 32
## Columns: 11
## $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
## $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
## $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
## $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~
To use the select method, we can pass our data set and the column we would like to select as parameters. This will return a new data frame with just that column.
select(mtcars, mpg)
## mpg
## Mazda RX4 21.0
## Mazda RX4 Wag 21.0
## Datsun 710 22.8
## Hornet 4 Drive 21.4
## Hornet Sportabout 18.7
## Valiant 18.1
## Duster 360 14.3
## Merc 240D 24.4
## Merc 230 22.8
## Merc 280 19.2
## Merc 280C 17.8
## Merc 450SE 16.4
## Merc 450SL 17.3
## Merc 450SLC 15.2
## Cadillac Fleetwood 10.4
## Lincoln Continental 10.4
## Chrysler Imperial 14.7
## Fiat 128 32.4
## Honda Civic 30.4
## Toyota Corolla 33.9
## Toyota Corona 21.5
## Dodge Challenger 15.5
## AMC Javelin 15.2
## Camaro Z28 13.3
## Pontiac Firebird 19.2
## Fiat X1-9 27.3
## Porsche 914-2 26.0
## Lotus Europa 30.4
## Ford Pantera L 15.8
## Ferrari Dino 19.7
## Maserati Bora 15.0
## Volvo 142E 21.4
When working with dplyr and the tidyverse, we often use the pipe, %>% operator. With this, we can send the data set to our method to use. Here is a rewrite of the code above.
mtcars %>% select(mpg)
## mpg
## Mazda RX4 21.0
## Mazda RX4 Wag 21.0
## Datsun 710 22.8
## Hornet 4 Drive 21.4
## Hornet Sportabout 18.7
## Valiant 18.1
## Duster 360 14.3
## Merc 240D 24.4
## Merc 230 22.8
## Merc 280 19.2
## Merc 280C 17.8
## Merc 450SE 16.4
## Merc 450SL 17.3
## Merc 450SLC 15.2
## Cadillac Fleetwood 10.4
## Lincoln Continental 10.4
## Chrysler Imperial 14.7
## Fiat 128 32.4
## Honda Civic 30.4
## Toyota Corolla 33.9
## Toyota Corona 21.5
## Dodge Challenger 15.5
## AMC Javelin 15.2
## Camaro Z28 13.3
## Pontiac Firebird 19.2
## Fiat X1-9 27.3
## Porsche 914-2 26.0
## Lotus Europa 30.4
## Ford Pantera L 15.8
## Ferrari Dino 19.7
## Maserati Bora 15.0
## Volvo 142E 21.4
To select multiple columns, we can pas the column names as parameters and a new data frame will return with all the columns.
mtcars %>% select(mpg, hp, wt)
## mpg hp wt
## Mazda RX4 21.0 110 2.620
## Mazda RX4 Wag 21.0 110 2.875
## Datsun 710 22.8 93 2.320
## Hornet 4 Drive 21.4 110 3.215
## Hornet Sportabout 18.7 175 3.440
## Valiant 18.1 105 3.460
## Duster 360 14.3 245 3.570
## Merc 240D 24.4 62 3.190
## Merc 230 22.8 95 3.150
## Merc 280 19.2 123 3.440
## Merc 280C 17.8 123 3.440
## Merc 450SE 16.4 180 4.070
## Merc 450SL 17.3 180 3.730
## Merc 450SLC 15.2 180 3.780
## Cadillac Fleetwood 10.4 205 5.250
## Lincoln Continental 10.4 215 5.424
## Chrysler Imperial 14.7 230 5.345
## Fiat 128 32.4 66 2.200
## Honda Civic 30.4 52 1.615
## Toyota Corolla 33.9 65 1.835
## Toyota Corona 21.5 97 2.465
## Dodge Challenger 15.5 150 3.520
## AMC Javelin 15.2 150 3.435
## Camaro Z28 13.3 245 3.840
## Pontiac Firebird 19.2 175 3.845
## Fiat X1-9 27.3 66 1.935
## Porsche 914-2 26.0 91 2.140
## Lotus Europa 30.4 113 1.513
## Ford Pantera L 15.8 264 3.170
## Ferrari Dino 19.7 175 2.770
## Maserati Bora 15.0 335 3.570
## Volvo 142E 21.4 109 2.780
The select method can use a few operators to help us select more
columns. For example, we can use the :
operator to select a range of
columns. In this example, select all columns between mpg and wt.
mtcars %>% select(mpg:wt)
## mpg cyl disp hp drat wt
## Mazda RX4 21.0 6 160.0 110 3.90 2.620
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875
## Datsun 710 22.8 4 108.0 93 3.85 2.320
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440
## Valiant 18.1 6 225.0 105 2.76 3.460
## Duster 360 14.3 8 360.0 245 3.21 3.570
## Merc 240D 24.4 4 146.7 62 3.69 3.190
## Merc 230 22.8 4 140.8 95 3.92 3.150
## Merc 280 19.2 6 167.6 123 3.92 3.440
## Merc 280C 17.8 6 167.6 123 3.92 3.440
## Merc 450SE 16.4 8 275.8 180 3.07 4.070
## Merc 450SL 17.3 8 275.8 180 3.07 3.730
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345
## Fiat 128 32.4 4 78.7 66 4.08 2.200
## Honda Civic 30.4 4 75.7 52 4.93 1.615
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835
## Toyota Corona 21.5 4 120.1 97 3.70 2.465
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520
## AMC Javelin 15.2 8 304.0 150 3.15 3.435
## Camaro Z28 13.3 8 350.0 245 3.73 3.840
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140
## Lotus Europa 30.4 4 95.1 113 3.77 1.513
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770
## Maserati Bora 15.0 8 301.0 335 3.54 3.570
## Volvo 142E 21.4 4 121.0 109 4.11 2.780
We can do the reverse using the !
to select all columns that are not
in the range of mpg to wt.
mtcars %>% select(!(mpg:wt))
## qsec vs am gear carb
## Mazda RX4 16.46 0 1 4 4
## Mazda RX4 Wag 17.02 0 1 4 4
## Datsun 710 18.61 1 1 4 1
## Hornet 4 Drive 19.44 1 0 3 1
## Hornet Sportabout 17.02 0 0 3 2
## Valiant 20.22 1 0 3 1
## Duster 360 15.84 0 0 3 4
## Merc 240D 20.00 1 0 4 2
## Merc 230 22.90 1 0 4 2
## Merc 280 18.30 1 0 4 4
## Merc 280C 18.90 1 0 4 4
## Merc 450SE 17.40 0 0 3 3
## Merc 450SL 17.60 0 0 3 3
## Merc 450SLC 18.00 0 0 3 3
## Cadillac Fleetwood 17.98 0 0 3 4
## Lincoln Continental 17.82 0 0 3 4
## Chrysler Imperial 17.42 0 0 3 4
## Fiat 128 19.47 1 1 4 1
## Honda Civic 18.52 1 1 4 2
## Toyota Corolla 19.90 1 1 4 1
## Toyota Corona 20.01 1 0 3 1
## Dodge Challenger 16.87 0 0 3 2
## AMC Javelin 17.30 0 0 3 2
## Camaro Z28 15.41 0 0 3 4
## Pontiac Firebird 17.05 0 0 3 2
## Fiat X1-9 18.90 1 1 4 1
## Porsche 914-2 16.70 0 1 5 2
## Lotus Europa 16.90 1 1 5 2
## Ford Pantera L 14.50 0 1 5 4
## Ferrari Dino 15.50 0 1 5 6
## Maserati Bora 14.60 0 1 5 8
## Volvo 142E 18.60 1 1 4 2
We can use select helpers, https://tidyselect.r-lib.org/reference/select_helpers.html, to query for columns. In this example, we select all columns that start with the letter “c”.
mtcars %>% select(starts_with("c"))
## cyl carb
## Mazda RX4 6 4
## Mazda RX4 Wag 6 4
## Datsun 710 4 1
## Hornet 4 Drive 6 1
## Hornet Sportabout 8 2
## Valiant 6 1
## Duster 360 8 4
## Merc 240D 4 2
## Merc 230 4 2
## Merc 280 6 4
## Merc 280C 6 4
## Merc 450SE 8 3
## Merc 450SL 8 3
## Merc 450SLC 8 3
## Cadillac Fleetwood 8 4
## Lincoln Continental 8 4
## Chrysler Imperial 8 4
## Fiat 128 4 1
## Honda Civic 4 2
## Toyota Corolla 4 1
## Toyota Corona 4 1
## Dodge Challenger 8 2
## AMC Javelin 8 2
## Camaro Z28 8 4
## Pontiac Firebird 8 2
## Fiat X1-9 4 1
## Porsche 914-2 4 2
## Lotus Europa 4 2
## Ford Pantera L 8 4
## Ferrari Dino 6 6
## Maserati Bora 8 8
## Volvo 142E 4 2
We can combine the select helpers with another operator, |
, to select
all columns that start with a “c” or end with a “m”.
mtcars %>% select(starts_with("c") | ends_with("m"))
## cyl carb am
## Mazda RX4 6 4 1
## Mazda RX4 Wag 6 4 1
## Datsun 710 4 1 1
## Hornet 4 Drive 6 1 0
## Hornet Sportabout 8 2 0
## Valiant 6 1 0
## Duster 360 8 4 0
## Merc 240D 4 2 0
## Merc 230 4 2 0
## Merc 280 6 4 0
## Merc 280C 6 4 0
## Merc 450SE 8 3 0
## Merc 450SL 8 3 0
## Merc 450SLC 8 3 0
## Cadillac Fleetwood 8 4 0
## Lincoln Continental 8 4 0
## Chrysler Imperial 8 4 0
## Fiat 128 4 1 1
## Honda Civic 4 2 1
## Toyota Corolla 4 1 1
## Toyota Corona 4 1 0
## Dodge Challenger 8 2 0
## AMC Javelin 8 2 0
## Camaro Z28 8 4 0
## Pontiac Firebird 8 2 0
## Fiat X1-9 4 1 1
## Porsche 914-2 4 2 1
## Lotus Europa 4 2 1
## Ford Pantera L 8 4 1
## Ferrari Dino 6 6 1
## Maserati Bora 8 8 1
## Volvo 142E 4 2 1