When working with data frames in R, we have many options for selected data. We can selec the columns and rows by position or name with a few different options. In this article, we will learn how to select columns and rows from a data frame in R.
We start by selecting a specific column. Similar to lists, we can use the double bracket [[]]
operator to select a column. This will return a vector data type.
ad.names = c("Google", "Facebook", "Twitter")
clicks = c(2000, 4000, 3000)
df = data.frame(name=ad.names, clicks)
df[[2]]
# [1] 2000 4000 3000
If we want to select a column and return a data frame, we can use the single bracket notation.
ad.names = c("Google", "Facebook", "Twitter")
clicks = c(2000, 4000, 3000)
df = data.frame(name=ad.names, clicks)
df[2]
# clicks
# 1 2000
# 2 4000
# 3 3000
We can also pass a vector of positions to select multiple columns.
ad.names = c("Google", "Facebook", "Twitter")
clicks = c(2000, 4000, 3000)
df = data.frame(name=ad.names, clicks)
df[c(1, 2)]
# name clicks
# 1 Google 2000
# 2 Facebook 4000
# 3 Twitter 3000
Since a data frame is a super powered matrix, R also let's us use matrix selection notation. This also allows us to specify rows we want to select. Let's see some examples.
ad.names = c("Google", "Facebook", "Twitter")
clicks = c(2000, 4000, 3000)
df = data.frame(name=ad.names, clicks)
## Select all rows and first column
df[, 1]
# "Google" "Facebook" "Twitter"
## Select first two rows and first 2 columns, return
df[1:2, c(1, 2)]
# name clicks
# 1 Google 2000
# 2 Facebook 4000
A very useful feature, is select columns by name. Similar to the above, we can use the double bracket, single bracket, and pass a vector of column names to select. R also has the $
operator which allows us to select a column name like a property.
ad.names = c("Google", "Facebook", "Twitter")
clicks = c(2000, 4000, 3000)
df = data.frame(name=ad.names, clicks)
## Select the clicks column, returns vector
df[["clicks"]]
# [1] 2000 4000 3000
## Select the clicks column with $, returns vector
df$clicks
# [1] 2000 4000 3000
## Select clicks column, returns data frame
df["name"]
# name
# 1 Google
# 2 Facebook
# 3 Twitter
## Select multiple columns
df[c("name", "clicks")]
# name clicks
# 1 Google 2000
# 2 Facebook 4000
# 3 Twitter 3000
Just like with the position, we can also select using matrix style notation.
ad.names = c("Google", "Facebook", "Twitter")
clicks = c(2000, 4000, 3000)
df = data.frame(name=ad.names, clicks)
## Select first two rows and the name column
df[1:2, "name"]
# [1] "Google" "Facebook"
## Select the first two rows and the first two columns
df[1:2, c("name", "clicks")]
# name clicks
# 1 Google 2000
# 2 Facebook 4000