Menu

Interpolation in Python – How to interpolate missing data, formula and approaches

Written by Selva Prabhakaran | 3 min read

Interpolation can be used to impute missing data. Let’s see the formula and how to implement in Python.

But, you need to be careful with this technique and try to really understand whether or not this is a valid choice for your data. Often, interpolation is applicable when the data is in a sequence or a series.

You should also know there are multiple interpolation methods available, the default is a linear method.

When to use interpolation for imputing missing data?

You can use interpolation when there is an order or a sequence and you want to estimate a missing value in the sequence. For example: Let’s say there are various classes of tickets in train travel, like, first class, second class, and so on. You would naturally expect the ticket price of the higher class to be more expensive than the lower class.

In that case, if the ticket price of an intermediate class is missing, you can use interpolation to estimate the missing value.

When not to use interpolation?

In case, there was no association between the order of the classes and the ticket fares, that is, if it was not necessary that the first class is more expensive than the second class, then, it might not be appropriate to use interpolation.

Let’s see this with an example.

python
import numpy as np
import pandas as pd
python
# class and ticket prices.
fare = {'first_class':100, 
        'second_class':np.nan, 
        'third_class':60, 
        'open_class':20}

Convert it to a pandas series object to make interpolation convenient.

python
# store as pandas series
ser = pd.Series(fare)
ser
python
first_class     100.0
second_class      NaN
third_class      60.0
open_class       20.0
dtype: float64

Now you can use ser.interpolate() to predict the missing value. By default, ser.interpolate() will do a linear interpolation.

Important caveat before you apply interpolation

Linear interpolation will take the index (0,1,2..) as the X and the column you want to interpolate as Y and do the interpolation. So, you need to make sure the X is sorted in your data to make this work.

In the above equation, when ‘x’ is known, you can compute the value of ‘y’, using the the following formula for linear interpolation.

Interpolation is also possible on a multi-dimensional space as well and is given by La-grange’s interpolation polynomial.

Implement linear interpolation

python
ser.interpolate(method='linear')
python
first_class     100.0
second_class     80.0
third_class      60.0
open_class       20.0
dtype: float64

linear interpolation may be more suitable if you assume the relationship between x (index) and y (value) to be linear. If not, you might want to try spline and cubicspline interpolation as well.

Spline interpolation

To use spline interpolation you need to make sure the index is reset to start from 0,1,2.. etc. So do a reset_index first, then do interpolate.

python
# order = 2
ser.reset_index(drop=True).interpolate(method='spline', order=2)
python
0    100.000000
1     86.666667
2     60.000000
3     20.000000
dtype: float64

Cubic spline

python
# cubic spline
ser.reset_index(drop=True).interpolate(method='cubicspline')
python
0    100.000000
1     86.666667
2     60.000000
3     20.000000
dtype: float64

[Next] Lesson 7: MICE imputation – How to predict missing values using machine learning in Python

Build Your First Machine Learning Project – ML Lessons Series

Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Machine Learning — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Get the full course,
completely free.
Join 57,000+ students learning Python, SQL & ML. One year of access, all resources included.
📚 10 Courses
🐍 Python & ML
🗄️ SQL
📦 Downloads
📅 1 Year Access
No thanks
🎓
Free AI/ML Starter Kit
Python · SQL · ML · 10 Courses · 57,000+ students
🎉   You're in! Check your inbox (or Promotions/Spam) for the access link.
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science