In this article, we will look at common methods for summarizing and describing data in your DataFrame. There are a list of methods to use when you first load data, that will help you get an idea of what is inside.
Let's start by creating a DataFrame.
import pandas as pd
df = pd.DataFrame([
{
"person": "James",
"sales": 1000,
"lastName": "Taylor",
},
{
"person": "Clara",
"sales": 3000,
"lastName": "Brown"
}
])
The first few properties we will look at are shape
, size
, ndim
and len
. Each of these will give you an idea of how many rows and columns are in your data. Specifically shape returns the number of (rows, columns).
print(df.shape)
print(df.size)
print(df.ndim)
print(len(df))
Next, let's look at the count method which will return the number of non-missing values for each column.
df.count()
We move on to look at some statistics. The follow summary methods are frquently used in interpreting continous data. For example, the min
function will give us the minimum value from each numerical column.
print(df.min())
print(df.max())
print(df.mean())
print(df.median())
print(df.std())
We can also can see many summary statistics by using the describe
method. This will print out the above and more for each of our columns.
print(df.describe())