In a previous article, we learned how to select a single column. Now, we move on to multiple columns. In this article, we will cover how to select multiple columns from a pandas DataFrame. We will use the index operator, the iloc
method and the loc
method. These will all return a subset DataFrame rather than a series.
The first method of selecting a columns is with the index operator. This is similar to a single column, however, we pass a list of column names instead of a signle column name.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
},
{
"person": "Clara",
"sales": 3000,
}
])
people = df[['pearson', 'sales']]
print(people)
The next way to select columns is using the loc
method. This method also allows us to select rows.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
},
{
"person": "Clara",
"sales": 3000,
}
])
newDf = df.loc[:, ['pearson', 'sales']]
print(newDf.head())
Notice here that we start with :
which is the slice operator for python lists. Basically we are saying, select all the rows and the "pearson" and "sales" column.
The final way to select columns is with the iloc
method. Instead of using labels, we use column indices. In the example below, we select all rows with columns 1 and 2.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
},
{
"person": "Clara",
"sales": 3000,
}
])
newDf = df.iloc[:, [1, 2]]
print(newDf.head())