Menu

SQL Window Functions – Made simple and intuitive

SQL window functions is one of the advanced concepts, understanding which will give you the ability to do complex data wrangling and transformations in SQL.

Written by Selva Prabhakaran | 14 min read

SQL window functions is one of the advanced concepts, understanding which will give you the ability to do complex data wrangling and transformations in SQL.

In this guide, we will intuitively understand how window functions work in a way you will never forget. Don’t memorize anything, just read through and you will clearly understand how the window functions works. Then you will know when and how to use it.

Once you complete this guide, head over to SQL Window Function Exercises and solve all the questions there.

Let’s get started.

In this on we will cover:

  1. Understanding Window Functions
  2. Type 1: Aggregation
    2.1. Aggregation functions used with Window functions
  3. Type 2: Ranking
    3.1. Ranking Functions
  4. Difference between rank() and dense_rank()?
  5. Type 3: Comparing boundaries with Offsets
    5.1. Offset Functions you will need
  6. Type 4: Rolling Aggregates with ROWS BETWEEN clause
    6.1. Understanding ROWS BETWEEN Clause

1. Understanding Window Functions

Window function performs a calculation across a ‘window of rows’ in relation to the current row. You get to define how these Windows, also called partitions, are created. You will see how to do that shortly.

This is comparable to ‘group by’, but instead of returning one row for a group, it will return result for each row based on the ‘window’ partitions that you define.

Initially, it will appears like the applications of window functions is vast and difficult to learn. But essentially, there are only four types of window functions operations.

By understanding these 4 types, you will understand nearly all applications of Window functions:

  1. Aggregation: You can create sums, averages, counts, max, min, etc. across related sets of data. For example, calculating a running total of sales.

  2. Ranking: You can rank items in your data set. For example, listing top salespeople, top 10 students, etc.

  3. Value comparison: You can compare values in a data set. For example, compare the current value with the previous value to find trends or patterns.

  4. Rolling Aggregates: Comparing specific ranks of values and aggregating if needed. You will learn the ROWS BETWEEN clause here.

Let’s use the students Table created below to illustrate the Window Function concept. You can create this table using the following schema.

You can practice the examples in this tutorial hands-on here.

sql
select * from students;
student_idsubjectscore
1Math85
1English92
1Science87
2Math91
2English88
2Science93
3Math78
3English85
3Science90
4Math92
4English86
4Science89
5Math90
5English88
5Science85
6Math93
6English82
6Science90

2. Type 1: Aggregation

This is similar to grouping using Group By, but instead of aggregating, we want to create a new column, without reducing the number of rows in the dataset.

Let’s understand with a solid example.

Task:

How to compute the average score for each student (and have the average shown as a separate column)?

Solution:

The usual way to do this is to do a ‘group by’ using student_id to get the average score for each student. Then, ‘left join’ it with the base table. This is what you typically do if you are not comfortable with Window functions.

You can achieve the same effect using Window function in one step. In the process let’s also see the syntax.

Notice the AVG(SCORE) OVER W part where W is the window you can define later.

sql
SELECT *,
Avg(score) over w as Average_Score
FROM STUDENTS
WINDOW w as (PARTITION BY student_id);
student_idsubjectscoreAverage_Score
1Math8588.0
1English9288.0
1Science8788.0
2Math9190.66666666666667
2English8890.66666666666667
2Science9390.66666666666667
3Math7884.33333333333333
3English8584.33333333333333
3Science9084.33333333333333
4Math9289.0
4English8689.0
4Science8989.0
5Math9087.66666666666667
5English8887.66666666666667
5Science8587.66666666666667
6Math9388.33333333333333
6English8288.33333333333333
6Science9088.33333333333333

The above command effectively creates partitions for each ‘student_id’. This is similar to groups, but without actually reducing the group to one row.

The AVG(SCORE) calculation is done on the entire partition (student_id) but is repeatedly done for every row iteratively, instead of grouping by student_id and returning one value per student_id.

Hope this is clear? If not, please read that one more time.

Now, The ‘window’ statement in the last line after ‘FROM’ is not accepted in certain databases, such as MS SQL or Oracle SQL. So, better take it to the select statement like below.

sql
SELECT *,
Avg(score) over (PARTITION BY student_id) as Average_Score
FROM STUDENTS
LIMIT 7;
student_idsubjectscoreAverage_Score
1Math8588.0
1English9288.0
1Science8788.0
2Math9190.66666666666667
2English8890.66666666666667
2Science9390.66666666666667
3Math7884.33333333333333

2.1. Aggregation functions used with Window functions

Here are the other Aggregation functions used with Window functions. You probably know how they work, else refer to the SQL

  1. min(value): returns the minimum value across all window rows
  2. max(value): returns the maximum value
  3. count(value): returns the count of non-null values
  4. avg(value): returns the average value
  5. sum(value): returns the sum total value
  6. group_concat(value, separator): returns a string combining values using separator (SQLite and MySQL only)

3. Type 2: Ranking

This is the second type of usecase for window functions. This is easy to understand with an example as well.

Task:

How to rank the students in descending order of their Total Score?. Where, Total score is the sum of scores in all three subjects.

Solution:

First compute the total score and put in a separate table.

sql
DROP TABLE IF EXISTS TEMP;

CREATE TABLE TEMP AS
SELECT student_id, 
       subject, 
       score,
       AVG(score) OVER (PARTITION BY student_id) as average_score,
       SUM(score) OVER (PARTITION BY student_id) as total_score
FROM students;

SELECT * FROM TEMP LIMIT 5;
student_idsubjectscoreaverage_scoretotal_score
1Math8588.0264
1English9288.0264
1Science8788.0264
2Math9190.66666666666667272
2English8890.66666666666667272

Notice what is happening above in SUM(score) OVER (PARTITION BY student_id) as total_score.

We sum(score) for each student while iterating through the rows. For each row, the window includes all the rows that belong to the current row’s student_id. This window is specified using the PARTITION BY student_id command.

Now, in a similar way, compute the rank in descending order of Total Score.

sql
SELECT *, 
RANK() OVER (order by TOTAL_SCORE DESC) as srank
FROM TEMP;
student_idsubjectscoreaverage_scoretotal_scoresrank
2Math9190.666666666666672721
2English8890.666666666666672721
2Science9390.666666666666672721
4Math9289.02674
4English8689.02674
4Science8989.02674
6Math9388.333333333333332657
6English8288.333333333333332657
6Science9088.333333333333332657
1Math8588.026410
1English9288.026410
1Science8788.026410
5Math9087.6666666666666726313
5English8887.6666666666666726313
5Science8587.6666666666666726313
3Math7884.3333333333333325316
3English8584.3333333333333325316
3Science9084.3333333333333325316

See how the rows containing the same student_id contains the same rank?

What if you want to increase the rank number for every row? So that the result becomes: 1,2,3,4 instead of 1,1,1,4?

To do this, we need to state another unique column besides the total score based on which RANK() can assign the rank.

sql
SELECT *, 
RANK() OVER (ORDER BY TOTAL_SCORE DESC, SCORE DESC) as iRANK
FROM TEMP;
student_idsubjectscoreaverage_scoretotal_scoreiRANK
2Science9390.666666666666672721
2Math9190.666666666666672722
2English8890.666666666666672723
4Math9289.02674
4Science8989.02675
4English8689.02676
6Math9388.333333333333332657
6Science9088.333333333333332658
6English8288.333333333333332659
1English9288.026410
1Science8788.026411
1Math8588.026412
5Math9087.6666666666666726313
5English8887.6666666666666726314
5Science8587.6666666666666726315
3Science9084.3333333333333325316
3English8584.3333333333333325317
3Math7884.3333333333333325318

3.1. Ranking Functions

  1. row_number(): returns the row ordinal number
  2. dense_rank(): returns row rank
  3. rank(): returns row rank with possible gaps (see below)
  4. ntile(n): splits all rows into n groups and returns the index of the group that the row belongs to

4. Difference between rank() and dense_rank()?

When items have same value, rank() will increment to the row_number() value when the next value occurs. Whereas, dense_rank() will increment by only 1.

Example: Trying to rank the following – 10, 10, 10, 17, 17, 29

  • rank() output: 1,1,1,4,4,6
  • dense_rank() output: 1,1,1,2,2,3

5. Type 3: Comparing boundaries with Offsets

Let’s suppose you want to compare the salaries of adjacent members, that is, you want to find what percentage the current value is larger than the previous value.

Let’t first create the salaries table.

sql
DROP TABLE IF EXISTS SALARIES;

CREATE TABLE salaries (
    id INT,
    name VARCHAR(20),
    city VARCHAR(20),
    dept VARCHAR(20),
    salary FLOAT
);

INSERT INTO SALARIES (id, name, city, dept, salary) VALUES
(21, 'Dhanya', 'Chennai', 'hr', 75),
(22, 'Bob', 'London', 'hr', 71),
(31, 'Akira', 'Chennai', 'it', 89),
(32, 'Grace', 'Berlin', 'it', 60),
(33, 'Steven', 'London', 'it', 103),
(34, 'Ramesh', 'Chennai', 'it', 103),
(35, 'Frank', 'Berlin', 'it', 120),
(41, 'Cindy', 'Berlin', 'sales', 95),
(42, 'Yetunde', 'London', 'sales', 95),
(43, 'Alice', 'Berlin', 'sales', 100);

Practice Hands-On Here. The table schema is created, you will need to write the queries on the right hand side panel and run.

Task:

How to find the percentage diff increase in salaries of neighbours?

Solution:

We need to have the salaries sorted (by salary ascending) and then look up the previous salary value. Then, use the current and previous value to compute the difference.

Let’s first get the previous value using lag function.

sql
select id, 
        name, 
        city, 
        dept, 
        salary,
        lag(salary) over (order by id asc) as prev_salary
from salaries;
idnamecitydeptsalaryprev_salary
21DhanyaChennaihr75.0None
22BobLondonhr71.075.0
31AkiraChennaiit89.071.0
32GraceBerlinit60.089.0
33StevenLondonit103.060.0
34RameshChennaiit103.0103.0
35FrankBerlinit120.0103.0
41CindyBerlinsales95.0120.0
42YetundeLondonsales95.095.0
43AliceBerlinsales100.095.0

The previous salary is created, we can now create the percentage difference.

sql
select *, 
        round((salary - prev_salary)/prev_salary, 4) * 100 as perc_diff 
from (select id, 
        name, 
        city, 
        dept, 
        salary,
        lag(salary) over (order by id asc) as prev_salary
from salaries);
idnamecitydeptsalaryprev_salaryperc_diff
21DhanyaChennaihr75.0NoneNone
22BobLondonhr71.075.0-5.33
31AkiraChennaiit89.071.025.35
32GraceBerlinit60.089.0-32.58
33StevenLondonit103.060.071.67
34RameshChennaiit103.0103.00.0
35FrankBerlinit120.0103.016.5
41CindyBerlinsales95.0120.0-20.830000000000002
42YetundeLondonsales95.095.00.0
43AliceBerlinsales100.095.05.26

This can be written using a with statement as well to improve the readability.

sql
with sal2 as (
select id, 
        name, 
        city, 
        dept, 
        salary,
        lag(salary) over (order by id asc) as prev_salary
from salaries)
select *, 
        round((salary - prev_salary)/prev_salary, 4) * 100 as perc_diff 
from sal2;
idnamecitydeptsalaryprev_salaryperc_diff
21DhanyaChennaihr75.0NoneNone
22BobLondonhr71.075.0-5.33
31AkiraChennaiit89.071.025.35
32GraceBerlinit60.089.0-32.58
33StevenLondonit103.060.071.67
34RameshChennaiit103.0103.00.0
35FrankBerlinit120.0103.016.5
41CindyBerlinsales95.0120.0-20.830000000000002
42YetundeLondonsales95.095.00.0
43AliceBerlinsales100.095.05.26

Alternately, we could’ve computed the lead and then computed the differences as well.

Let’s look at one more example.

Task 2:

How to compute the perc difference between the min and max values of salary in each department, while keeping the entries ordered by salary?

That is, you want to compute the percentage difference between the first (min) and last (max) values in each department.

Solution:

Let’s try using the min() and max() as window functions.

sql
select id, 
        name, 
        city, 
        dept, 
        salary,
        min(salary) over (partition by dept order by id asc) as min_salary,
        max(salary) over (partition by dept order by id asc) as max_salary
from salaries
idnamecitydeptsalarymin_salarymax_salary
21DhanyaChennaihr75.075.075.0
22BobLondonhr71.071.075.0
31AkiraChennaiit89.089.089.0
32GraceBerlinit60.060.089.0
33StevenLondonit103.060.0103.0
34RameshChennaiit103.060.0103.0
35FrankBerlinit120.060.0120.0
41CindyBerlinsales95.095.095.0
42YetundeLondonsales95.095.095.0
43AliceBerlinsales100.095.0100.0

There is a problem here.

The min values look fine but the max values aren’t.

Can we try using the first_value() and last_value() functions?

sql
select id, 
        name, 
        city, 
        dept, 
        salary,
        first_value(salary) over (partition by dept order by id asc) as min_salary,
        last_value(salary) over (partition by dept order by id asc) as max_salary
from salaries
idnamecitydeptsalarymin_salarymax_salary
21DhanyaChennaihr75.075.075.0
22BobLondonhr71.075.071.0
31AkiraChennaiit89.089.089.0
32GraceBerlinit60.089.060.0
33StevenLondonit103.089.0103.0
34RameshChennaiit103.089.0103.0
35FrankBerlinit120.089.0120.0
41CindyBerlinsales95.095.095.0
42YetundeLondonsales95.095.095.0
43AliceBerlinsales100.095.0100.0

This does not work as well. Can you figure what is happening here?

Because, eventhough we have defined the window parition by department, since we are using order by, the processing is valid ONLY for the ‘Window Frame’, where the ‘frame’ goes from the first row to the current iteration row.

This is the default behaviour of the Frame. And this can be controlled, which we will do shortly.

But, let’s first understand the ‘Frame’ within the ‘Window’ clearly.

Ok. Have a look at the above ‘Salaries’ table.

When the iteration starts, the current row is the first row, whichever it is.

As the iteration proceeds for a given window (department), the frame size expands cumulatively from the beginning row up to the current row.

That is, when current row = 1, frame = 1st row only. But when the current row is 4, the frame includes the first 4 rows.

Look, how it is progressing below:

Window 1: Dept = HR
– Current row = 1, Frame rows = 1
– Current row = 2, Frame rows = 1,2

Window 2: Dept = IT
– Current row = 1, Frame rows = 1
– Current row = 2, Frame rows = 1,2
– Current row = 3, Frame rows = 1,2,3
– Current row = 4, Frame rows = 1,2,3,4

And so on..

We can abosolutely control the size of the frame in each iteration. I cover this in the next section.

But, let’s start by solving this usecase: Computing the min and max salary for each department, while keeping the order of neighbours.

Here it is.

sql
select id, 
        name, 
        city, 
        dept, 
        salary,
        first_value(salary) over (partition by dept order by salary asc) as min_salary,
        last_value(salary) over (partition by dept 
                                 order by salary 
                                 rows between unbounded preceding and unbounded following) as max_salary
from salaries;
idnamecitydeptsalarymin_salarymax_salary
22BobLondonhr71.071.075.0
21DhanyaChennaihr75.071.075.0
32GraceBerlinit60.060.0120.0
31AkiraChennaiit89.060.0120.0
33StevenLondonit103.060.0120.0
34RameshChennaiit103.060.0120.0
35FrankBerlinit120.060.0120.0
41CindyBerlinsales95.095.0100.0
42YetundeLondonsales95.095.0100.0
43AliceBerlinsales100.095.0100.0

The key part that makes the frame extend to the full width of the window is: rows between unbounded preceding and unbounded following. We will break this down in detail in the following section.

5.1. Offset Functions you will need

  • lag(value, offset): returns the value from the record that is offset rows behind the current one
  • lead(value, offset): returns the value from the record that is offset rows ahead of the current one
  • first_value(value): returns the value from the first row of the frame
  • last_value(value): returns the value from the last row of the frame
  • nth_value(value, n): returns the value from the n-th row of the frame

6. Type 4: Rolling Aggregates with ROWS BETWEEN clause

Let’s continue and take our understanding of window functions up a notch.

To get a complete understanding of window functions, you need to fully understand the rows between command.

Again, let’s understand using an example.

Task:

Compute difference between the minimum and the second last value.

Solution:

Previously we saw how to compute the difference between the maximum and minimum. Let’s use those ideas to get the diff between second max and the minimum value.

First, to get the second last value we need to know the number of values in the group. Once you know this, you can get whichever value you want, be it second last, third from last and so on.

So, let’s get the number of items in each group first.

We are extending on top of the previous query, so look at the count(*) statement in the below query.

sql
select id, 
        name, 
        city, 
        dept, 
        salary,
        first_value(salary) over (partition by dept order by salary asc) as min_salary,
        last_value(salary) over (partition by dept 
                                 order by salary 
                                 rows between unbounded preceding and unbounded following) as max_salary,
        count(*) over (partition by dept
                       rows between unbounded preceding and unbounded following) as count
from salaries;
idnamecitydeptsalarymin_salarymax_salarycount
22BobLondonhr71.071.075.02
21DhanyaChennaihr75.071.075.02
32GraceBerlinit60.060.0120.05
31AkiraChennaiit89.060.0120.05
33StevenLondonit103.060.0120.05
34RameshChennaiit103.060.0120.05
35FrankBerlinit120.060.0120.05
41CindyBerlinsales95.095.0100.03
42YetundeLondonsales95.095.0100.03
43AliceBerlinsales100.095.0100.03

6.1. Understanding ROWS BETWEEN Clause

Note here the general structure of rows between is as follows.

The syntax is:

ROWS BETWEEN <lower_bound> AND <upper_bound>

The purpose of the ROWS clause is to specify the starting and the ending row of the frame in relation to the ‘current row’.

Starting and ending rows might be fixed or relative to the current row based on the following keywords:

  1. CURRENT ROW: the current row
  2. UNBOUNDED PRECEDING: all rows before the current row -> fixed
  3. UNBOUNDED FOLLOWING: all rows after the current row -> fixed
  4. ‘n’ PRECEDING: ‘n’ rows before the current row -> relative
  5. ‘n’ FOLLOWING: ‘n’ rows after the current row -> relative

Here are few examples of how you can use the ROWS BETWEEN clause:

Example 1: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Frame would include the entire window for each iteration.
Example 2: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: would include from 1st row in window to current row.
Example 3: ROWS BETWEEN UNBOUNDED PRECEDING AND 2 PRECEDING: would include from 1st row in window to 2 rows BEFORE the current row.
Example 4: ROWS BETWEEN UNBOUNDED PRECEDING AND 3 FOLLOWING: would include from 1st row in window to 3 rows AFTER the current row.

Questions:

What would it be for:

  1. Two rows before and two rows after the current row? (Easy)
  2. Two rows before to one before the last row? (Intermediate). Hint: find the count.

So, how to get the second last value?

–> Sort by salary first, then, use nth_value() and count column to get the n-1’th item.

sql
select * from salaries;
idnamecitydeptsalary
21DhanyaChennaihr75.0
22BobLondonhr71.0
31AkiraChennaiit89.0
32GraceBerlinit60.0
33StevenLondonit103.0
34RameshChennaiit103.0
35FrankBerlinit120.0
41CindyBerlinsales95.0
42YetundeLondonsales95.0
43AliceBerlinsales100.0

Count the number of rows in each partition. We will later use this along with nth_value function next.

sql
select id, 
        name, 
        city, 
        dept, 
        salary,
        first_value(salary) over (partition by dept order by salary asc) as min_salary,
        last_value(salary) over (partition by dept 
                                 order by salary 
                                 rows between unbounded preceding and unbounded following) as max_salary,
        count(*) over (partition by dept 
                                 order by salary 
                                 rows between unbounded preceding and unbounded following) as count
from salaries;

idnamecitydeptsalarymin_salarymax_salarycount
22BobLondonhr71.071.075.02
21DhanyaChennaihr75.071.075.02
32GraceBerlinit60.060.0120.05
31AkiraChennaiit89.060.0120.05
33StevenLondonit103.060.0120.05
34RameshChennaiit103.060.0120.05
35FrankBerlinit120.060.0120.05
41CindyBerlinsales95.095.0100.03
42YetundeLondonsales95.095.0100.03
43AliceBerlinsales100.095.0100.03

Extract the second last value

This can be done as a subquery using the nth_value(salary, count-1) call.

sql
DROP TABLE If EXISTS TMP2;
CREATE TABLE TMP2 AS
with tmp as (
select id, 
        name, 
        city, 
        dept, 
        salary,
        first_value(salary) over (partition by dept order by salary asc) as min_salary,
        last_value(salary) over (partition by dept 
                                 order by salary 
                                 rows between unbounded preceding and unbounded following) as max_salary,
        count(*) over (partition by dept 
                                 order by salary 
                                 rows between unbounded preceding and unbounded following) as count
from salaries)
select id,name,city,dept,salary,
        min_salary, 
        nth_value(salary, count-1) over (partition by dept 
                                         order by salary
                                        rows between unbounded preceding and unbounded following) as last_minus_1_salary
from tmp;        

Finally, compute the difference between last_minus_1 and minimum salary.

sql
SELECT *, (last_minus_1_salary - min_salary) as DIFF 
FROM TMP2;
idnamecitydeptsalarymin_salarylast_minus_1_salaryDIFF
22BobLondonhr71.071.071.00.0
21DhanyaChennaihr75.071.071.00.0
32GraceBerlinit60.060.0103.043.0
31AkiraChennaiit89.060.0103.043.0
33StevenLondonit103.060.0103.043.0
34RameshChennaiit103.060.0103.043.0
35FrankBerlinit120.060.0103.043.0
41CindyBerlinsales95.095.095.00.0
42YetundeLondonsales95.095.095.00.0
43AliceBerlinsales100.095.095.00.0

Congratulations if you have made it till here! You should have a fair understanding of Window functions now.

Next step is to try and practice SQL Window functions. I also teach complete SQL for Data Science and more, I look forward to see you inside.

Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master SQL — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Free Callback - Limited Slots
Not Sure Which Course to Start With?
Talk to our AI Counsellors and Practitioners. We'll help you clear all your questions for your background and goals, bridging the gap between your current skills and a career in AI.
10-digit mobile number
📞
Thank You!
We'll Call You Soon!
Our learning advisor will reach out within 24 hours.
(Check your inbox too — we've sent a confirmation)
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science