How to Filter Rows in a Pandas DataFrame with Boolean Masks

2021-01-14

Intro

Pandas provides a feature called Boolean Masks that let's you filter DataFrames based on conditions. With this, we can write simple queries to filter our data. In this article, we will learn how to use Boolean Masks to filter rows in our DataFrame.

Filter Rows with a Simple Boolean Mask

To filter DataFrames with Boolean Masks we use the index operator and pass a comparison for a specific column. In the example below, pandas will filter all rows for sales greater than 1000.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
		"lastName": "Taylor",
	},
	{
		"person": "Luna",
		"sales": 2000,
		"lastName": "Mound"
	},
	{
		"person": "Clara",
		"sales": 3000,
		"lastName": "Brown"
	}
])

filtered = df[['sales'] > 1000]
print(filtered)

Filter Rows with Multiple Boolean Masks

We can also combine boolean filters similar to if statements. We have two examples below. The first shows how we can use the | operator to create an or comparison.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
		"lastName": "Taylor",
	},
	{
		"person": "Luna",
		"sales": 2000,
		"lastName": "Mound"
	},
	{
		"person": "Clara",
		"sales": 3000,
		"lastName": "Brown"
	},
])

filtered = df[['sales'] <= 1000 | ['sales'] >= 3000]
print(filtered)

The second example show how to use the & operator to create an and comparison.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
		"lastName": "Taylor",
	},
	{
		"person": "Luna",
		"sales": 2000,
		"lastName": "Mound"
	},
	{
		"person": "Clara",
		"sales": 3000,
		"lastName": "Brown"
	},
])

filtered = df[['sales'] > 1000 & ['sales'] < 3000]
print(filtered)
GoTea - KoalaTea