Menu

Pandas Describe

Written by MachineLearningPlus | 5 min read

How to use Pandas Describe function?

The pandas.describe function is used to get a descriptive statistics summary of a given dataframe. This includes mean, count, std deviation, percentiles, and min-max values of all the features.

In this article, you will learn about different features of the describe function. We will also learn about the parameters of the function in depth.

pandas.describe

  • Syntax: pandas.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)Purpose: Generate descriptive statistics. Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values. Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types.
  • Parameters:
    • percentiles:list-like of numbers The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
    • include:‘all’, list-like of dtypes or None (default) A white list of data types to include in the result. ‘all’: All columns of the input will be included in the output, A list-like of dtypes : Limits the results to the provided data types, None (default) : The result will include all numeric columns.
    • exclude:ist-like of dtypes or None (default) A black list of data types to omit from the result. A list-like of dtypes : Excludes the provided data types from the result, None (default) : The result will exclude nothing.
    • datetime_is_numeric:bool, default False Whether to treat datetime dtypes as numeric. This affects statistics calculated for the column. For DataFrame input, this also controls whether datetime columns are included by default.
  • Returns : Series or DataFrame Summary statistics of the Series or Dataframe provided.
python
# Import Packages
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

Pandas Describe Function

The Describe function returns the statistical summary of the dataframe or series. This includes count, mean, median (or 50th percentile) standard variation, min-max, and percentile values of columns. To perform this function, chain .describe() to the dataframe or series.

1. Pandas Describe function on Series

When pandas describe function is applied to a series object, the result is also returned in the form of series

python
# Create a Series
numericSeries = pd.Series([1,4,6,53,2,2,1,1])

# Apply describe function
numericSeries.describe()
python
count     8.000000
mean      8.750000
std      17.966238
min       1.000000
25%       1.000000
50%       2.000000
75%       4.500000
max      53.000000
dtype: float64

2. Pandas Describe function on DataFrame

On applying pandas describe function to a dataframe, the result is also returned as a dataframe . This dataframe will consist of a statistics summary for all the numeric features of the dataframe.

python
# Create a dataframe
df = pd.DataFrame({
                    'Subject_1_Marks': [14, 42, 21, 12, 45],
                    'Subject_2_Marks': [32, 43, 23, 50, 21],
                    'Subject_3_Marks': [45.0, 34.0, 23.0, 8.0, 21.0],
                    'Names': ['Saksham', 'Ayushi', 'Abhishek', 'Saksham', 'Saumya']
                    }
                 )

# Apply describe function
df.describe()
Pandas describe function

 

 

How to get summary for non-numeric features?

Sometimes, we have non-numeric features also. Have a look at the data types of the features of the example dataset:

python
df.dtypes
python
Subject_1_Marks      int64
Subject_2_Marks      int64
Subject_3_Marks    float64
Names               object
dtype: object

By default, the describe function only returns the summary for numeric features of the dataset. To get a summary for other data types, you can tweak the include parameter of the describe function.

1. Include='all' parameter

Specifying include='all' will force pandas to generate summaries for all types of features in the dataframe. Some data types like string type don’t have any mean or standard deviation. In such cases, pandas will mark them as NaN.

python
# describe function with include='all'

df.describe(include='all')
Non numeric function of pandas describe

You can see that the describe function returns different features such as unique values, top value, and its frequency for the string type data (Names column). It returns the same set of features for categorical data type features.

2. List of data types for include parameter

Alternatively, you can also specify data types to be included in the summary using include parameter. Pandas will generate summaries only for those data types that are present in the include parameter list.

python
# describe function with include= ['object']
df.describe(include=['object'])
Non numeric function of pandas describe

How to exclude data types from the summary?

You can blacklist the data types from being included in the summary. exclude parameter takes the list of all such data types.

python
# describe function with exclude= ['float']
df.describe(exclude=['float'])
Exclude data types of pandas describe

In our example dataframe, Subject_3_Marks is float64 and that’s why it was not included in the above summary.

Customize Percentiles of Pandas Describe function

The default percentiles of the describe function are 25th, 50th, and 75th percentile or (0.25, 0.5, and 0.75). You can pass your own percentiles to the pandas describe function using the percentiles parameter. It takes in the list of all the percentiles (between 0 to 1).

Note: 50th percentile will be included in any of the cases as 50th percentile also denotes median

python
# describe function with percentiles=[0.1, 0.3, 0.7]
df.describe(percentiles=[0.1, 0.3, 0.7])
Customize percentile

Treat DateTime values as numeric

By default,pandas datetime values are treated as datetime objects. The summary for such objects includes the first date, last date, count, unique values, top value and its frequency.

python
# create a datetime series
series = pd.date_range(start='27/05/2021', periods=len(df))

# adding dates series to dataframe
df['dates'] = series

# describe function on dates
df.dates.describe()
python
count                       5
unique                      5
top       2021-05-28 00:00:00
freq                        1
first     2021-05-27 00:00:00
last      2021-05-31 00:00:00
Name: dates, dtype: object

You can make pandas recognize date-time values as numeric using datetime_is_numeric. It takes the boolean value as True/False. Let’s understand with an example.

python
# describe function with datetime_is_numeric=True
df.describe(datetime_is_numeric=True)
Default case

Practical Tips

  • It is a good practice to look at the descriptive statistics of the dataset before moving ahead for further analysis. For instance, a feature with 0 standard variances may not be useful. 0 std indicates that all the values of the feature column are the same.

Test your knowledge

Q1: Median is missing from the describe function. True or False?

Answer:

Answer: False. The 50th percentile is the same as the median of the dataset.

Q2: How can you display a statistics summary for all data types?

Answer:

Answer: By using include=all parameter. It displays summaries for all data types.

Q3: Which parameter is used to define custom percentiles other than the default ones?

Answer:

Answer: percentiles parameter takes the list of all the percentiles scaled between 0 to 1.

To test your pandas fundamentals further, checkout our blog on pandas exercises here.

The article was contributed by Kaustubh G and Shrivarsheni

Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Pandas — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Get the full course,
completely free.
Join 57,000+ students learning Python, SQL & ML. One year of access, all resources included.
📚 10 Courses
🐍 Python & ML
🗄️ SQL
📦 Downloads
📅 1 Year Access
No thanks
🎓
Free AI/ML Starter Kit
Python · SQL · ML · 10 Courses · 57,000+ students
🎉   You're in! Check your inbox (or Promotions/Spam) for the access link.
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science