Measures of Dispersion – Unlocking the Variability Diving Deep into Measures of Dispersion

Dive deep into the world of statistics and measures of dispersion, from understanding its essence to its practical application using Python.

Written by Jagdeesh | 6 min read

Dive deep into the world of statistics and measures of dispersion, from understanding its essence to its practical application using Python.

In this Blog post we will learn:

What is Dispersion in Statistics?
Advantages and Applications of Measures of Dispersion:
Types of Measures of Dispersion
3.1. Absolute Measure of Dispersion
3.2. Relative Measure of Dispersion
Different ways to visualize the measures of dispersion
Conclusion

1. What is Dispersion in Statistics?

Dispersion in statistics refers to the extent to which a set of data is spread out. While central tendency (like mean, median, and mode) gives us a central value of the data set, dispersion gives us an idea of how spread out the data points are around this central value. A set of data can have the same mean or median but can vary significantly in their levels of dispersion.

2. Advantages and Applications of Measures of Dispersion:

Better Understanding: Dispersion measures give a more comprehensive picture of the data. Knowing only the average doesn’t tell us about the variability or consistency of data.
Comparing Variability: By understanding dispersion, we can compare the variability of two or more sets of data.
Predictive Analysis: In fields like finance and stock markets, measures of dispersion such as variance and standard deviation help in assessing risks.
Quality Control: In manufacturing, understanding dispersion helps in ensuring the consistency of products.

3. Types of Measures of Dispersion

There are two main categories:

Absolute Measure of Dispersion: These measures give the dispersion in the same units as the original data. They are independent of the unit of measurement.
Absolute dispersion methods include:
- Range
- Variance and Standard Deviation
- Quartile Deviation

Relative Measure of Dispersion: These measures are dimensionless and are usually expressed in percentage form. They help in comparing the dispersion of two or more sets of data.
Relative dispersion methods include:
- Coefficient of Range
- Coefficient of Variation (CV)
- Coefficient of Quartile Deviation

3.1. Absolute Measure of Dispersion

1. Range:
It is the simplest measure of dispersion.
– Formula:
Range = Maximum value – Minimum value

python

- **Where:**  
  - Maximum value is the largest value in the dataset.
  - Minimum value is the smallest value in the dataset.

Example:
For data set {5, 12, 18, 23}, Range = 23 – 5 = 18

python

data = [5, 12, 18, 23]
range_value = max(data) - min(data)
print("Range:", range_value)  # Output: Range: 18

python

Range: 18

2. Variance and Standard Deviation:
For ungrouped data, variance is the average of the squared differences from the Mean.

Variance (σ^2):
$ \sigma^2 = \frac{\sum (X_i – \bar{X})^2}{N} $
- Where:
  - $X_i$ represents each individual data point.
  - $\bar{X}$ represents the mean of the data.
  - $N$ represents the total number of data points.
Standard Deviation (σ):
$ \sigma = \sqrt{\sigma^2} $
- Where:
  - $\sigma^2$ represents the variance.

Example:
For data set {2, 4, 4, 4, 5, 5, 7, 9}, Variance = 4.571 and Standard Deviation = 2.138

python

import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
variance = statistics.variance(data)
std_dev = statistics.stdev(data)
print("Variance:", variance)  # Output: Variance: 4.571428571428571
print("Standard Deviation:", std_dev)  # Output: Standard Deviation: 2.138089935299395

python

Variance: 4.571428571428571
Standard Deviation: 2.138089935299395

3. Quartile Deviation (or Semi-Interquartile Range):
It is half the difference between the first and the third quartile.
– Formula:
$ QD = \frac{Q3 – Q1}{2} $

Where:
- $Q1$ represents the first quartile.
- $Q3$ represents the third quartile.

python

import numpy as np
data = [5, 7, 8, 9, 10, 12, 14, 16, 18]
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
QD = (Q3 - Q1) / 2
print("Quartile Deviation:", QD)

python

Quartile Deviation: 3.0

3.2. Relative Measure of Dispersion

1. Coefficient of Range:
It is the ratio of the range to the sum of the maximum and minimum values.
– Formula:
$ CR = \frac{Range}{Maximum + Minimum} $

python

- **Where:**  
  - Range is the difference between the maximum and minimum values.
  - Maximum value is the largest value in the dataset.
  - Minimum value is the smallest value in the dataset.

python

data = [5, 12, 18, 23]
range_value = max(data) - min(data)
CR = range_value / (max(data) + min(data))
print("Coefficient of Range:", CR)

python

Coefficient of Range: 0.6428571428571429

2. Coefficient of Variation (CV):
It is the ratio of the standard deviation to the mean expressed as a percentage.
– Formula:
$ CV = \frac{σ}{\bar{X}} \times 100\% $

python

- **Where:**  
  - $\sigma$ represents the standard deviation.
  - $\bar{X}$ represents the mean of the data.

Example:
If the standard deviation of a data set is 5 and the mean is 20, CV = 25%.

python

data = [10, 20, 30, 40, 50]
mean = statistics.mean(data)
std_dev = statistics.stdev(data)
CV = (std_dev/mean) * 100
print("Coefficient of Variation:", CV, "%")  # Output: Coefficient of Variation: 44.7213595499958 %

python

Coefficient of Variation: 52.70462766947299 %

3. Coefficient of Quartile Deviation:
It is the ratio of the quartile deviation to the average of the first and third quartiles.
– Formula:
$ CQD = \frac{QD}{\frac{Q1 + Q3}{2}} $

python

- **Where:**  
  - $QD$ represents the quartile deviation.
  - $Q1$ represents the first quartile.
  - $Q3$ represents the third quartile.

python

data = [5, 7, 8, 9, 10, 12, 14, 16, 18]

Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
QD = (Q3 - Q1) / 2
CQD = QD / ((Q1 + Q3) / 2)

print(f"Coefficient of Quartile Deviation: {CQD:.2f}")

python

Coefficient of Quartile Deviation: 0.27

4. Different ways to visualize the measures of dispersion

Visualizing measures of dispersion can help in understanding the spread of data. Python, particularly with libraries like Matplotlib and Seaborn, provides an array of visualization options to display measures of dispersion.

Here are some common ways:

1. Box Plot (Box-and-Whisker Plot):

It shows the median, quartiles, and potential outliers in the dataset.
Dispersion is represented by the interquartile range (IQR) and the whiskers of the plot.

python

import seaborn as sns
data = [5, 7, 8, 8, 9, 10, 12, 12, 14, 15, 16, 18]
sns.boxplot(data=data)

2. Histogram:

It showcases the distribution of data.
The width of the bars indicates the range of the data, while the height of the bars indicates the frequency.

python

import matplotlib.pyplot as plt
import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Generate 1000 random data points
data = np.random.randn(1000)

plt.hist(data, bins=10, edgecolor="k", alpha=0.7)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of Data')
plt.show()

3. Variance and Standard Deviation:

Using a line plot or scatter plot, you can highlight the mean and then showcase the standard deviation away from the mean.

python

# Set seed for reproducibility
np.random.seed(42)

# Generate 1000 random data points
data = np.random.randn(1000)

mean = sum(data)/len(data)
std_dev = (sum([(x-mean)**2 for x in data]) / len(data))**0.5

plt.axvline(mean, color='red', linestyle='dashed', linewidth=1, label=f'Mean: {mean}')
plt.axvline(mean+std_dev, color='blue', linestyle='dashed', linewidth=1, label=f'Standard Deviation: {std_dev}')
plt.axvline(mean-std_dev, color='blue', linestyle='dashed', linewidth=1)

plt.legend(loc="upper right")
plt.hist(data, bins=10, edgecolor="k", alpha=0.7)
plt.show()

4. Violin Plot:

It’s a combination of a box plot and a kernel density estimation.
This plot showcases the probability density of the data at different values.

python

sns.violinplot(data=data)

5. Standard Deviation Bars:

Overlaying the standard deviation on bar charts to showcase the variability of multiple datasets.

python

categories = ['A', 'B', 'C']
values = [50, 60, 55]
std_devs = [5, 8, 3]

plt.bar(categories, values, yerr=std_devs, capsize=10, color='lightblue', edgecolor='k')
plt.ylabel('Value')
plt.title('Bar Chart with Standard Deviation Bars')
plt.show()

5. Conclusion

Measures of dispersion not only provide a holistic understanding of datasets but, when leveraged correctly, can lead to more nuanced insights and better decision-making. With Python, these measures are just a few lines of code away, offering a powerful blend of theory and application for statisticians.

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Jagdeesh →

Related Course

Master Statistics — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

Measures of Dispersion – Unlocking the Variability Diving Deep into Measures of Dispersion

1. What is Dispersion in Statistics?

2. Advantages and Applications of Measures of Dispersion:

3. Types of Measures of Dispersion

3.1. Absolute Measure of Dispersion

3.2. Relative Measure of Dispersion

4. Different ways to visualize the measures of dispersion

5. Conclusion

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

1. What is Dispersion in Statistics?

2. Advantages and Applications of Measures of Dispersion:

3. Types of Measures of Dispersion

3.1. Absolute Measure of Dispersion

3.2. Relative Measure of Dispersion

4. Different ways to visualize the measures of dispersion

5. Conclusion

Related Articles

Sampling and Sampling Distributions – A Comprehensive Guide on Sampling and Sampling Distributions

Law of Large Numbers – A Deep Dive into the World of Statistics

Central Limit Theorem – A Deep Dive into Central Limit Theorem and its Significance in Statistics

Python.SQL. NumPy. All free.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python.
SQL. NumPy.
All free.