In this article, we will explore four ways to access columns in a pandas DataFrame. We will explore using the index operator, dot operator, .loc method, and .iloc method. Each of these ways will return a pandas Series object (a super powered row). We will later see how to retrieve a sub DataFrame of columns.
The first method of selecting a column is via the index operator. This is very similar to how we access values in a dictionary, yet this returns a Sereies.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
},
{
"person": "Clara",
"sales": 3000,
}
])
people = df['pearson']
print(people)
We can also accomplish the above using the attribute access or the dot operator, as long as the property doesn't have spaces or special characters. For example, a column called Sales People
would not work.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
},
{
"person": "Clara",
"sales": 3000,
}
])
people = df.pearson
print(people)
The third way to access a column is with the loc
method. This is also known as label based access in the pandas world as we will use the column labels to access. Keep reading to see the iloc
method which is different.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
},
{
"person": "Clara",
"sales": 3000,
}
])
people = df.loc[:, "pearson"]
print(people)
Notice here that we start with :
which is the slice operator for python lists. Basically we are saying, select all the rows and the "pearson" column.
The final way to access a column is with the iloc
method which is known as positonal-based access in pandas. This is because we will use the the index for the column rather than the name.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
},
{
"person": "Clara",
"sales": 3000,
}
])
people = df.iloc[:, 1]
print(people)
Here we are saying, select all the rows and the first column.