Pandas provides a feature called Boolean Masks that let's you filter DataFrames based on conditions. With this, we can write simple queries to filter our data. In this article, we will learn how to use Boolean Masks to filter rows in our DataFrame.
To filter DataFrames with Boolean Masks we use the index operator and pass a comparison for a specific column. In the example below, pandas will filter all rows for sales greater than 1000.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
"lastName": "Taylor",
},
{
"person": "Luna",
"sales": 2000,
"lastName": "Mound"
},
{
"person": "Clara",
"sales": 3000,
"lastName": "Brown"
}
])
filtered = df[df['sales'] > 1000]
print(filtered)
We can also combine boolean filters similar to if statements. We have two examples below. The first shows how we can use the |
operator to create an or
comparison.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
"lastName": "Taylor",
},
{
"person": "Luna",
"sales": 2000,
"lastName": "Mound"
},
{
"person": "Clara",
"sales": 3000,
"lastName": "Brown"
},
])
filtered = df[(df['sales'] <= 1000) | (df['sales'] >= 3000)]
print(filtered)
The second example show how to use the &
operator to create an and
comparison.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
"lastName": "Taylor",
},
{
"person": "Luna",
"sales": 2000,
"lastName": "Mound"
},
{
"person": "Clara",
"sales": 3000,
"lastName": "Brown"
},
])
filtered = df[(df['sales'] > 1000) & (df['sales'] < 3000)]
print(filtered)