How to Access a Single Column in a Pandas DataFrame

2021-01-10

Intro

In this article, we will explore four ways to access columns in a pandas DataFrame. We will explore using the index operator, dot operator, .loc method, and .iloc method. Each of these ways will return a pandas Series object (a super powered row). We will later see how to retrieve a sub DataFrame of columns.

Access columns with the Index Operator

The first method of selecting a column is via the index operator. This is very similar to how we access values in a dictionary, yet this returns a Sereies.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
	},
	{
		"person": "Clara",
		"sales": 3000,
	}
])

people = df['pearson']
print(people)

Access columns with attribute access

We can also accomplish the above using the attribute access or the dot operator, as long as the property doesn't have spaces or special characters. For example, a column called Sales People would not work.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
	},
	{
		"person": "Clara",
		"sales": 3000,
	}
])

people = df.pearson
print(people)

Accessing a column with loc

The third way to access a column is with the loc method. This is also known as label based access in the pandas world as we will use the column labels to access. Keep reading to see the iloc method which is different.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
	},
	{
		"person": "Clara",
		"sales": 3000,
	}
])

people = df.loc[:, "pearson"]
print(people)

Notice here that we start with : which is the slice operator for python lists. Basically we are saying, select all the rows and the "pearson" column.

Accessing a column with iloc

The final way to access a column is with the iloc method which is known as positonal-based access in pandas. This is because we will use the the index for the column rather than the name.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
	},
	{
		"person": "Clara",
		"sales": 3000,
	}
])

people = df.iloc[:, 1]
print(people)

Here we are saying, select all the rows and the first column.

GoTea - KoalaTea