Menu

Measures of Dispersion – Unlocking the Variability Diving Deep into Measures of Dispersion

Written by Jagdeesh | 6 min read

Dive deep into the world of statistics and measures of dispersion, from understanding its essence to its practical application using Python.

In this Blog post we will learn:

  1. What is Dispersion in Statistics?
  2. Advantages and Applications of Measures of Dispersion:
  3. Types of Measures of Dispersion
    3.1. Absolute Measure of Dispersion
    3.2. Relative Measure of Dispersion
  4. Different ways to visualize the measures of dispersion
  5. Conclusion

1. What is Dispersion in Statistics?

Dispersion in statistics refers to the extent to which a set of data is spread out. While central tendency (like mean, median, and mode) gives us a central value of the data set, dispersion gives us an idea of how spread out the data points are around this central value. A set of data can have the same mean or median but can vary significantly in their levels of dispersion.

2. Advantages and Applications of Measures of Dispersion:

  1. Better Understanding: Dispersion measures give a more comprehensive picture of the data. Knowing only the average doesn’t tell us about the variability or consistency of data.

  2. Comparing Variability: By understanding dispersion, we can compare the variability of two or more sets of data.

  3. Predictive Analysis: In fields like finance and stock markets, measures of dispersion such as variance and standard deviation help in assessing risks.

  4. Quality Control: In manufacturing, understanding dispersion helps in ensuring the consistency of products.

3. Types of Measures of Dispersion

There are two main categories:

  1. Absolute Measure of Dispersion: These measures give the dispersion in the same units as the original data. They are independent of the unit of measurement.

    Absolute dispersion methods include:

    • Range
    • Variance and Standard Deviation
    • Quartile Deviation

  1. Relative Measure of Dispersion: These measures are dimensionless and are usually expressed in percentage form. They help in comparing the dispersion of two or more sets of data.

    Relative dispersion methods include:

    • Coefficient of Range
    • Coefficient of Variation (CV)
    • Coefficient of Quartile Deviation

3.1. Absolute Measure of Dispersion

1. Range:
It is the simplest measure of dispersion.
Formula:
Range = Maximum value – Minimum value

python
- **Where:**  
  - Maximum value is the largest value in the dataset.
  - Minimum value is the smallest value in the dataset.
  • Example:
    For data set {5, 12, 18, 23}, Range = 23 – 5 = 18
python
data = [5, 12, 18, 23]
range_value = max(data) - min(data)
print("Range:", range_value)  # Output: Range: 18
python
Range: 18

2. Variance and Standard Deviation:
For ungrouped data, variance is the average of the squared differences from the Mean.

  • Variance (σ^2):
    $ \sigma^2 = \frac{\sum (X_i – \bar{X})^2}{N} $

    • Where:
      • $X_i$ represents each individual data point.
      • $\bar{X}$ represents the mean of the data.
      • $N$ represents the total number of data points.
  • Standard Deviation (σ):
    $ \sigma = \sqrt{\sigma^2} $

    • Where:
      • $\sigma^2$ represents the variance.

  • Example:
    For data set {2, 4, 4, 4, 5, 5, 7, 9}, Variance = 4.571 and Standard Deviation = 2.138
python
import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
variance = statistics.variance(data)
std_dev = statistics.stdev(data)
print("Variance:", variance)  # Output: Variance: 4.571428571428571
print("Standard Deviation:", std_dev)  # Output: Standard Deviation: 2.138089935299395
python
Variance: 4.571428571428571
Standard Deviation: 2.138089935299395

3. Quartile Deviation (or Semi-Interquartile Range):
It is half the difference between the first and the third quartile.
Formula:
$ QD = \frac{Q3 – Q1}{2} $

  • Where:
    • $Q1$ represents the first quartile.
    • $Q3$ represents the third quartile.
python
import numpy as np
data = [5, 7, 8, 9, 10, 12, 14, 16, 18]
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
QD = (Q3 - Q1) / 2
print("Quartile Deviation:", QD) 
python
Quartile Deviation: 3.0

3.2. Relative Measure of Dispersion

1. Coefficient of Range:
It is the ratio of the range to the sum of the maximum and minimum values.
Formula:
$ CR = \frac{Range}{Maximum + Minimum} $

python
- **Where:**  
  - Range is the difference between the maximum and minimum values.
  - Maximum value is the largest value in the dataset.
  - Minimum value is the smallest value in the dataset.
python
data = [5, 12, 18, 23]
range_value = max(data) - min(data)
CR = range_value / (max(data) + min(data))
print("Coefficient of Range:", CR) 
python
Coefficient of Range: 0.6428571428571429

2. Coefficient of Variation (CV):
It is the ratio of the standard deviation to the mean expressed as a percentage.
Formula:
$ CV = \frac{σ}{\bar{X}} \times 100\% $

python
- **Where:**  
  - $\sigma$ represents the standard deviation.
  - $\bar{X}$ represents the mean of the data.

  • Example:
    If the standard deviation of a data set is 5 and the mean is 20, CV = 25%.
python
data = [10, 20, 30, 40, 50]
mean = statistics.mean(data)
std_dev = statistics.stdev(data)
CV = (std_dev/mean) * 100
print("Coefficient of Variation:", CV, "%")  # Output: Coefficient of Variation: 44.7213595499958 %
python
Coefficient of Variation: 52.70462766947299 %

3. Coefficient of Quartile Deviation:
It is the ratio of the quartile deviation to the average of the first and third quartiles.
Formula:
$ CQD = \frac{QD}{\frac{Q1 + Q3}{2}} $

python
- **Where:**  
  - $QD$ represents the quartile deviation.
  - $Q1$ represents the first quartile.
  - $Q3$ represents the third quartile.
python
data = [5, 7, 8, 9, 10, 12, 14, 16, 18]

Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
QD = (Q3 - Q1) / 2
CQD = QD / ((Q1 + Q3) / 2)

print(f"Coefficient of Quartile Deviation: {CQD:.2f}")
python
Coefficient of Quartile Deviation: 0.27

4. Different ways to visualize the measures of dispersion

Visualizing measures of dispersion can help in understanding the spread of data. Python, particularly with libraries like Matplotlib and Seaborn, provides an array of visualization options to display measures of dispersion.

Here are some common ways:

1. Box Plot (Box-and-Whisker Plot):

  • It shows the median, quartiles, and potential outliers in the dataset.
  • Dispersion is represented by the interquartile range (IQR) and the whiskers of the plot.
python
import seaborn as sns
data = [5, 7, 8, 8, 9, 10, 12, 12, 14, 15, 16, 18]
sns.boxplot(data=data)

2. Histogram:

  • It showcases the distribution of data.
  • The width of the bars indicates the range of the data, while the height of the bars indicates the frequency.
python
import matplotlib.pyplot as plt
import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Generate 1000 random data points
data = np.random.randn(1000)

plt.hist(data, bins=10, edgecolor="k", alpha=0.7)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of Data')
plt.show()

3. Variance and Standard Deviation:

Using a line plot or scatter plot, you can highlight the mean and then showcase the standard deviation away from the mean.

python
# Set seed for reproducibility
np.random.seed(42)

# Generate 1000 random data points
data = np.random.randn(1000)

mean = sum(data)/len(data)
std_dev = (sum([(x-mean)**2 for x in data]) / len(data))**0.5

plt.axvline(mean, color='red', linestyle='dashed', linewidth=1, label=f'Mean: {mean}')
plt.axvline(mean+std_dev, color='blue', linestyle='dashed', linewidth=1, label=f'Standard Deviation: {std_dev}')
plt.axvline(mean-std_dev, color='blue', linestyle='dashed', linewidth=1)

plt.legend(loc="upper right")
plt.hist(data, bins=10, edgecolor="k", alpha=0.7)
plt.show()

4. Violin Plot:

  • It’s a combination of a box plot and a kernel density estimation.
  • This plot showcases the probability density of the data at different values.
python
sns.violinplot(data=data)

5. Standard Deviation Bars:

Overlaying the standard deviation on bar charts to showcase the variability of multiple datasets.

python
categories = ['A', 'B', 'C']
values = [50, 60, 55]
std_devs = [5, 8, 3]

plt.bar(categories, values, yerr=std_devs, capsize=10, color='lightblue', edgecolor='k')
plt.ylabel('Value')
plt.title('Bar Chart with Standard Deviation Bars')
plt.show()

5. Conclusion

Measures of dispersion not only provide a holistic understanding of datasets but, when leveraged correctly, can lead to more nuanced insights and better decision-making. With Python, these measures are just a few lines of code away, offering a powerful blend of theory and application for statisticians.

Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Jagdeesh
Written by
Related Course
Master Statistics — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Get the full course,
completely free.
Join 57,000+ students learning Python, SQL & ML. One year of access, all resources included.
📚 10 Courses
🐍 Python & ML
🗄️ SQL
📦 Downloads
📅 1 Year Access
No thanks
🎓
Free AI/ML Starter Kit
Python · SQL · ML · 10 Courses · 57,000+ students
🎉   You're in! Check your inbox (or Promotions/Spam) for the access link.
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science