How to Delete Data From a Pandas DataFrame

01.15.2021

Intro

Often when you have data loaded there are extra columns or rows you would like to remove. If you are doing feature engineering, you will also need to remove irrelavent columns/features. In this article, we will look at some of the ways to remove data from a Pandas DataFrame

Removing Data with the Del keyword

The first way we can remove a column is with the del python keyword. In the example below, we delete the lastName column

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
		"lastName": "Taylor",
	},
	{
		"person": "Clara",
		"sales": 3000,
		"lastName": "Brown"
	}
])

del df['lastName']
print(df)

Remove Data with the pop method

We can accomplish the same thing as above using the pop method on a DataFrame and passing the column name.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
		"lastName": "Taylor",
	},
	{
		"person": "Clara",
		"sales": 3000,
		"lastName": "Brown"
	}
])

df.pop('lastName')
print(df)

Remove Data with the Drop Method

The final, and maybe the most common, way we can remove columns or rows with the drop method. There are a few was do to this. The code below shows how with comments.

import pandas as pd

df = pd.DataFrame([
	{
		"person": "James",
		"sales": 1000,
		"lastName": "Taylor",
	},
	{
		"person": "Clara",
		"sales": 3000,
		"lastName": "Brown"
	}
])

# drops the first 2 rows and assumes axis=0
df.drop([0, 1])
print(df)

# drops the first 2 columns since we set axis=1
df.drop([0, 1], axis=1)
print(df)

# drops the pearson column since we specified the columns parameter
df.drop(columns=["pearson"])
print(df)