Often when you have data loaded there are extra columns or rows you would like to remove. If you are doing feature engineering, you will also need to remove irrelavent columns/features. In this article, we will look at some of the ways to remove data from a Pandas DataFrame
The first way we can remove a column is with the del
python keyword. In the example below, we delete the lastName column
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
"lastName": "Taylor",
},
{
"person": "Clara",
"sales": 3000,
"lastName": "Brown"
}
])
del df['lastName']
print(df)
We can accomplish the same thing as above using the pop
method on a DataFrame and passing the column name.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
"lastName": "Taylor",
},
{
"person": "Clara",
"sales": 3000,
"lastName": "Brown"
}
])
df.pop('lastName')
print(df)
The final, and maybe the most common, way we can remove columns or rows with the drop method. There are a few was do to this. The code below shows how with comments.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
"lastName": "Taylor",
},
{
"person": "Clara",
"sales": 3000,
"lastName": "Brown"
}
])
# drops the first 2 rows and assumes axis=0
df.drop([0, 1])
print(df)
# drops the first 2 columns since we set axis=1
df.drop([0, 1], axis=1)
print(df)
# drops the pearson column since we specified the columns parameter
df.drop(columns=["pearson"])
print(df)