Menu

101 Polars Exercises for Data Analysis (with Solutions)

Master Polars with 101 hands-on exercises and solutions — covering DataFrames, groupby, joins, window functions, lazy eval, and more.

Written by Selva Prabhakaran | 52 min read

Practice Polars — the blazing-fast DataFrame library for Python — with these 101 exercises ranging from beginner to advanced.

This post has interactive code — click ‘Run’ or press Ctrl+Enter on any code block to execute it directly in your browser. The first run may take a few seconds to initialize.

Polars is a lightning-fast DataFrame library written in Rust with a Python API. It is designed for performance and ergonomics, offering lazy evaluation, expressive syntax, and first-class support for parallel execution. These 101 exercises will help you master Polars through hands-on practice.

The exercises are organized by increasing difficulty across topics like Series, DataFrames, filtering, groupby, joins, string operations, datetime handling, reshaping, and more.

Before you begin: Run the code block below to install Polars. This only needs to be done once per session.

import polars as pl
print("Polars", pl.__version__, "ready!")

Difficulty Levels:

  • L1 — Beginner
  • L2 — Intermediate
  • L3 — Advanced

1. How to import polars and check the version?

Difficulty Level: L1

Import polars and print the version installed.

Solve:

# Task: Import polars and check the version

# Write your code below

Desired Output:

python
1.39.2
Show Solution
import polars as pl
print(pl.__version__)

2. How to create a Series from a list, numpy array, and dict?

Difficulty Level: L1

Create a polars Series from each of the following: a list, a numpy array, and a dictionary (keys as name, values as data).

Solve:

import polars as pl
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

# Write your code below

Desired Output:

python
shape: (10,)
Series: 'values' [i32]
[
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
]
Show Solution
import polars as pl
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

# From list
ser1 = pl.Series("letters", mylist)

# From numpy array
ser2 = pl.Series("numbers", myarr.tolist())

# From dict (keys become a column, values become another)
ser3 = pl.Series("values", list(mydict.values()))
print(ser1.head())
print(ser2.head())
print(ser3.head())

3. How to convert a Series into a DataFrame with the index as a column?

Difficulty Level: L1

Polars doesn’t have an index. Given a dictionary, create a two-column DataFrame with keys in one column and values in another.

Solve:

import polars as pl
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

# Write your code below

Desired Output:

python
shape: (5, 2)
┌─────┬───────┐
│ key ┆ value │
│ --- ┆ ---   │
│ str ┆ i32   │
╞═════╪═══════╡
│ a   ┆ 0     │
│ b   ┆ 1     │
│ c   ┆ 2     │
│ e   ┆ 3     │
│ d   ┆ 4     │
└─────┴───────┘
Show Solution
import polars as pl
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

df = pl.DataFrame({"key": list(mydict.keys()), "value": list(mydict.values())})
print(df.head())

4. How to combine many Series to form a DataFrame?

Difficulty Level: L1

Combine ser1 and ser2 to form a DataFrame.

Solve:

import polars as pl
import numpy as np
ser1 = pl.Series("col1", list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pl.Series("col2", np.arange(26).tolist())

# Write your code below

Desired Output:

python
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ i32  │
╞══════╪══════╡
│ a    ┆ 0    │
│ b    ┆ 1    │
│ c    ┆ 2    │
│ e    ┆ 3    │
│ d    ┆ 4    │
└──────┴──────┘
Show Solution
import polars as pl
import numpy as np
ser1 = pl.Series("col1", list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pl.Series("col2", np.arange(26).tolist())

df = pl.DataFrame([ser1, ser2])
print(df.head())

5. How to assign a name to a Series?

Difficulty Level: L1

Give a name 'alphabets' to the series ser.

Solve:

import polars as pl
ser = pl.Series(list('abcedfghijklmnopqrstuvwxyz'))

# Write your code below

Desired Output:

python
shape: (10,)
Series: 'alphabets' [str]
[
    "a"
    "b"
    "c"
    "e"
    "d"
    "f"
    "g"
    "h"
    "i"
    "j"
]
Show Solution
import polars as pl
ser = pl.Series(list('abcedfghijklmnopqrstuvwxyz'))

ser = ser.alias("alphabets")
print(ser.head())

6. How to get the items of Series A not present in Series B?

Difficulty Level: L2

From ser1, remove items present in ser2.

Solve:

import polars as pl
ser1 = pl.Series("a", [1, 2, 3, 4, 5])
ser2 = pl.Series("b", [4, 5, 6, 7, 8])

# Write your code below

Desired Output:

python
shape: (3,)
Series: 'a' [i64]
[
    1
    2
    3
]
Show Solution
import polars as pl
ser1 = pl.Series("a", [1, 2, 3, 4, 5])
ser2 = pl.Series("b", [4, 5, 6, 7, 8])

result = ser1.filter(~ser1.is_in(ser2))
print(result)

7. How to get the items not common to both Series A and Series B?

Difficulty Level: L2

Get all items of ser1 and ser2 not common to both.

Solve:

import polars as pl
ser1 = pl.Series("a", [1, 2, 3, 4, 5])
ser2 = pl.Series("b", [4, 5, 6, 7, 8])

# Write your code below

Desired Output:

python
shape: (6,)
Series: 'union' [i64]
[
    1
    2
    3
    6
    7
    8
]
Show Solution
import polars as pl
import numpy as np
ser1 = pl.Series("a", [1, 2, 3, 4, 5])
ser2 = pl.Series("b", [4, 5, 6, 7, 8])

union = pl.Series("union", np.union1d(ser1, ser2).tolist())
intersect = pl.Series("intersect", np.intersect1d(ser1, ser2).tolist())
result = union.filter(~union.is_in(intersect))
print(result)

8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric Series?

Difficulty Level: L1

Compute the min, 25th percentile, median, 75th percentile, and max of ser.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.normal(10, 5, 25).tolist())

# Write your code below

Desired Output:

python
[0.43, 7.19, 8.83, 12.48, 17.9]
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.normal(10, 5, 25).tolist())

result = [
    ser.min(),
    ser.quantile(0.25),
    ser.median(),
    ser.quantile(0.75),
    ser.max(),
]
print(result)

9. How to get frequency counts of unique items of a Series?

Difficulty Level: L1

Calculate the frequency counts of each unique value in ser.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("chars", np.take(list('abcdefgh'), np.random.randint(8, size=30)))

# Write your code below

Desired Output:

python
shape: (7, 2)
┌───────┬───────┐
│ chars ┆ count │
│ ---   ┆ ---   │
│ str   ┆ u32   │
╞═══════╪═══════╡
│ h     ┆ 6     │
│ c     ┆ 5     │
│ e     ┆ 5     │
│ d     ┆ 4     │
│ g     ┆ 4     │
│ f     ┆ 3     │
│ b     ┆ 3     │
└───────┴───────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("chars", np.take(list('abcdefgh'), np.random.randint(8, size=30)))

print(ser.value_counts().sort("count", descending=True))

10. How to keep only the top 2 most frequent values and replace everything else as ‘Other’?

Difficulty Level: L2

In ser, keep the top 2 most frequent values as-is. Replace all other values with 'Other'.

Solve:

import polars as pl
import numpy as np
np.random.seed(100)
ser = pl.Series("data", np.random.randint(1, 5, [12]).tolist())

# Write your code below

Desired Output:

python
shape: (12, 1)
┌───────┐
│ data  │
│ ---   │
│ str   │
╞═══════╡
│ 1     │
│ 1     │
│ 4     │
│ 4     │
│ 4     │
│ …     │
│ Other │
│ Other │
│ 1     │
│ Other │
│ Other │
└───────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(100)
ser = pl.Series("data", np.random.randint(1, 5, [12]).tolist())

counts = ser.value_counts().sort("count", descending=True)
top2 = counts.head(2)["data"].to_list()
result = ser.cast(pl.String).to_frame("data").with_columns(
    pl.when(pl.col("data").cast(pl.Int64).is_in(top2))
    .then(pl.col("data"))
    .otherwise(pl.lit("Other"))
    .alias("data")
)
print(result)

11. How to bin a numeric Series to 10 groups of equal size?

Difficulty Level: L2

Bin the series ser into 10 equal-sized decile groups and label them from 1st to 10th.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.random(20).tolist())

# Write your code below

Desired Output:

python
shape: (10,)
Series: 'data' [cat]
[
    "4th"
    "10th"
    "8th"
    "6th"
    "2nd"
    "2nd"
    "1st"
    "9th"
    "7th"
    "8th"
]
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.random(20).tolist())

labels = ['1st','2nd','3rd','4th','5th','6th','7th','8th','9th','10th']
breakpoints = [i / 10 for i in range(11)]

result = ser.cut(breakpoints[1:-1], labels=labels)
print(result)

12. How to convert a numpy array to a DataFrame of given shape?

Difficulty Level: L1

Reshape the series ser into a DataFrame with 7 rows and 5 columns.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.randint(1, 10, 35).tolist())

# Write your code below

Desired Output:

python
shape: (7, 5)
┌──────┬──────┬──────┬──────┬──────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ i32  ┆ i32  ┆ i32  ┆ i32  │
╞══════╪══════╪══════╪══════╪══════╡
│ 7    ┆ 4    ┆ 8    ┆ 5    ┆ 7    │
│ 3    ┆ 7    ┆ 8    ┆ 5    ┆ 4    │
│ 8    ┆ 8    ┆ 3    ┆ 6    ┆ 5    │
│ 2    ┆ 8    ┆ 6    ┆ 2    ┆ 5    │
│ 1    ┆ 6    ┆ 9    ┆ 1    ┆ 3    │
│ 7    ┆ 4    ┆ 9    ┆ 3    ┆ 5    │
│ 3    ┆ 7    ┆ 5    ┆ 9    ┆ 7    │
└──────┴──────┴──────┴──────┴──────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
arr = np.random.randint(1, 10, 35).reshape(7, 5)

df = pl.DataFrame(arr.tolist(), schema=[f"col{i}" for i in range(5)])
print(df)

13. How to find the positions of numbers that are multiples of 3 from a Series?

Difficulty Level: L2

Find the positions of numbers that are multiples of 3 from ser.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.randint(1, 10, 7).tolist())

# Write your code below

Desired Output:

python
shape: (7,)
Series: 'data' [i32]
[
    7
    4
    8
    5
    7
    3
    7
]
[5]
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.randint(1, 10, 7).tolist())

print(ser)
positions = [i for i, v in enumerate(ser) if v % 3 == 0]
print(positions)

14. How to extract items at given positions from a Series?

Difficulty Level: L1

From ser, extract the items at positions 0, 4, 8, and 14.

Solve:

import polars as pl
ser = pl.Series("letters", list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14]

# Write your code below

Desired Output:

python
shape: (4,)
Series: 'letters' [str]
[
    "a"
    "e"
    "i"
    "o"
]
Show Solution
import polars as pl
ser = pl.Series("letters", list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14]

result = ser.gather(pos)
print(result)

15. How to stack two Series vertically and horizontally?

Difficulty Level: L1

Stack ser1 and ser2 vertically and horizontally (to form a DataFrame).

Solve:

import polars as pl
ser1 = pl.Series("col1", [0, 1, 2, 3, 4])
ser2 = pl.Series("col2", [5, 6, 7, 8, 9])

# Write your code below

Desired Output:

python
shape: (10,)
Series: 'col1' [i64]
[
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
]

shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 0    ┆ 5    │
│ 1    ┆ 6    │
│ 2    ┆ 7    │
│ 3    ┆ 8    │
│ 4    ┆ 9    │
└──────┴──────┘
Show Solution
import polars as pl
ser1 = pl.Series("col1", [0, 1, 2, 3, 4])
ser2 = pl.Series("col2", [5, 6, 7, 8, 9])

# Vertical
vertical = pl.concat([ser1, ser2])
print(vertical)

# Horizontal
horizontal = pl.DataFrame([ser1, ser2])
print(horizontal)

16. How to get the positions of items of Series A in Series B?

Difficulty Level: L2

Get the positions of items of ser2 in ser1 as a list.

Solve:

import polars as pl
ser1 = pl.Series("a", [10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pl.Series("b", [1, 3, 10, 13])

# Write your code below

Desired Output:

python
[5, 4, 0, 8]
Show Solution
import polars as pl
ser1 = pl.Series("a", [10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pl.Series("b", [1, 3, 10, 13])

positions = [ser1.to_list().index(val) for val in ser2.to_list()]
print(positions)

17. How to compute the mean squared error between a truth and predicted Series?

Difficulty Level: L1

Compute the mean squared error of truth and pred Series.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
truth = pl.Series("truth", np.arange(10).tolist())
pred = pl.Series("pred", (np.arange(10) + np.random.random(10)).tolist())

# Write your code below

Desired Output:

python
0.3603
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
truth = pl.Series("truth", np.arange(10).astype(float).tolist())
pred = pl.Series("pred", (np.arange(10) + np.random.random(10)).tolist())

mse = ((truth - pred) ** 2).mean()
print(mse)

18. How to convert the first character of each element in a Series to uppercase?

Difficulty Level: L2

Change the first character of each word to upper case in each word of ser.

Solve:

import polars as pl
ser = pl.Series("words", ['how', 'to', 'kick', 'ass?'])

# Write your code below

Desired Output:

python
shape: (4, 1)
┌───────┐
│ words │
│ ---   │
│ str   │
╞═══════╡
│ How   │
│ To    │
│ Kick  │
│ Ass?  │
└───────┘
Show Solution
import polars as pl
ser = pl.Series("words", ['how', 'to', 'kick', 'ass?'])

result = ser.to_frame("words").with_columns(
    pl.col("words").str.to_titlecase().alias("words")
)
print(result)

19. How to calculate the number of characters in each word in a Series?

Difficulty Level: L1

Calculate the number of characters in each word in ser.

Solve:

import polars as pl
ser = pl.Series("words", ['how', 'to', 'kick', 'ass?'])

# Write your code below

Desired Output:

python
shape: (4,)
Series: 'words' [u32]
[
    3
    2
    4
    4
]
Show Solution
import polars as pl
ser = pl.Series("words", ['how', 'to', 'kick', 'ass?'])

result = ser.str.len_chars()
print(result)

20. How to compute the difference of differences between consecutive numbers of a Series?

Difficulty Level: L1

Compute the difference of differences between consecutive numbers of ser.

Solve:

import polars as pl
ser = pl.Series("data", [1, 3, 6, 10, 15, 21, 27, 35])

# Write your code below

Desired Output:

python
shape: (6,)
Series: 'data' [i64]
[
    1
    1
    1
    1
    0
    2
]
Show Solution
import polars as pl
ser = pl.Series("data", [1, 3, 6, 10, 15, 21, 27, 35])

diff1 = ser.diff()
diff2 = diff1.diff()
print(diff2.drop_nulls())

21. How to convert a Series of date strings to a datetime type?

Difficulty Level: L2

Convert the ser to a datetime Series.

Solve:

import polars as pl
ser = pl.Series("dates", ['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])

# Write your code below

Desired Output:

python
shape: (6,)
Series: 'dates' [datetime[μs]]
[
    2010-01-01 00:00:00
    2011-02-02 00:00:00
    2012-03-03 00:00:00
    2013-04-04 00:00:00
    2014-05-05 00:00:00
    2015-06-06 12:20:00
]
Show Solution
import polars as pl
from dateutil.parser import parse

ser = pl.Series("dates", ['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])

# Use dateutil to parse mixed formats
parsed = pl.Series("dates", [parse(d) for d in ser.to_list()])
print(parsed)

22. How to get the day of month, week number, day of year, and day of week from a datetime Series?

Difficulty Level: L2

Get the day of month, week number, day of year, and day of week from ser.

Solve:

import polars as pl
from dateutil.parser import parse
ser = pl.Series("dates", [parse(d) for d in ['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20']])

# Write your code below

Desired Output:

python
shape: (6, 5)
┌─────────────────────┬──────────────┬─────────────┬─────────────┬─────────────┐
│ dates               ┆ day_of_month ┆ week_number ┆ day_of_year ┆ day_of_week │
│ ---                 ┆ ---          ┆ ---         ┆ ---         ┆ ---         │
│ datetime[μs]        ┆ i8           ┆ i8          ┆ i16         ┆ i8          │
╞═════════════════════╪══════════════╪═════════════╪═════════════╪═════════════╡
│ 2010-01-01 00:00:00 ┆ 1            ┆ 53          ┆ 1           ┆ 5           │
│ 2011-02-02 00:00:00 ┆ 2            ┆ 5           ┆ 33          ┆ 3           │
│ 2012-03-03 00:00:00 ┆ 3            ┆ 9           ┆ 63          ┆ 6           │
│ 2013-04-04 00:00:00 ┆ 4            ┆ 14          ┆ 94          ┆ 4           │
│ 2014-05-05 00:00:00 ┆ 5            ┆ 19          ┆ 125         ┆ 1           │
│ 2015-06-06 12:20:00 ┆ 6            ┆ 23          ┆ 157         ┆ 6           │
└─────────────────────┴──────────────┴─────────────┴─────────────┴─────────────┘
Show Solution
import polars as pl
from dateutil.parser import parse

dates = [parse(d) for d in ['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20']]
ser = pl.Series("dates", dates)

df = ser.to_frame("dates").with_columns(
    pl.col("dates").dt.day().alias("day_of_month"),
    pl.col("dates").dt.week().alias("week_number"),
    pl.col("dates").dt.ordinal_day().alias("day_of_year"),
    pl.col("dates").dt.weekday().alias("day_of_week"),
)
print(df)

23. How to convert year-month string to dates corresponding to the 4th day of the month?

Difficulty Level: L2

Change ser to dates that start with the 4th of the respective months.

Solve:

import polars as pl
ser = pl.Series("dates", ['Jan 2010', 'Feb 2011', 'Mar 2012'])

# Write your code below

Desired Output:

python
shape: (3,)
Series: 'dates' [datetime[μs]]
[
    2010-01-04 00:00:00
    2011-02-04 00:00:00
    2012-03-04 00:00:00
]
Show Solution
import polars as pl
from dateutil.parser import parse

ser = pl.Series("dates", ['Jan 2010', 'Feb 2011', 'Mar 2012'])

result = pl.Series("dates", [parse('04 ' + d) for d in ser.to_list()])
print(result)

24. How to filter words that contain at least 2 vowels from a Series?

Difficulty Level: L3

From ser, extract words that contain at least 2 vowels.

Solve:

import polars as pl
ser = pl.Series("words", ['Apple', 'Orange', 'Plan', 'Python', 'Money'])

# Write your code below

Desired Output:

python
shape: (3, 1)
┌────────┐
│ words  │
│ ---    │
│ str    │
╞════════╡
│ Apple  │
│ Orange │
│ Money  │
└────────┘
Show Solution
import polars as pl
ser = pl.Series("words", ['Apple', 'Orange', 'Plan', 'Python', 'Money'])

result = ser.to_frame("words").filter(
    pl.col("words").str.count_matches(r"[aeiouAEIOU]") >= 2
)
print(result)

25. How to filter valid emails from a Series?

Difficulty Level: L3

Extract valid email addresses from ser.

Solve:

import polars as pl
emails = pl.Series("emails", ['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])

# Write your code below

Desired Output:

python
shape: (3, 1)
┌───────────────────┐
│ emails            │
│ ---               │
│ str               │
╞═══════════════════╡
│ rameses@egypt.com │
│ matt@t.co         │
│ narendra@modi.com │
└───────────────────┘
Show Solution
import polars as pl
emails = pl.Series("emails", ['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])

pattern = r'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$'
result = emails.to_frame("emails").filter(
    pl.col("emails").str.contains(pattern)
)
print(result)

26. How to get the mean of a Series grouped by another Series?

Difficulty Level: L2

Compute the mean of weights grouped by fruit.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
fruit = pl.Series("fruit", np.random.choice(['apple', 'banana', 'carrot'], 10).tolist())
weights = pl.Series("weights", np.linspace(1, 10, 10).tolist())

# Write your code below

Desired Output:

python
shape: (3, 2)
┌────────┬──────────┐
│ fruit  ┆ weights  │
│ ---    ┆ ---      │
│ str    ┆ f64      │
╞════════╪══════════╡
│ apple  ┆ 4.333333 │
│ banana ┆ 8.0      │
│ carrot ┆ 5.666667 │
└────────┴──────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
fruit = pl.Series("fruit", np.random.choice(['apple', 'banana', 'carrot'], 10).tolist())
weights = pl.Series("weights", np.linspace(1, 10, 10).tolist())

df = pl.DataFrame([fruit, weights])
print(df.group_by("fruit").agg(pl.col("weights").mean()))

27. How to compute the euclidean distance between two Series?

Difficulty Level: L1

Compute the euclidean distance between p and q.

Solve:

import polars as pl
p = pl.Series("p", list(range(1, 11)))
q = pl.Series("q", list(range(10, 0, -1)))

# Write your code below

Desired Output:

python
18.17
Show Solution
import polars as pl
import numpy as np
p = pl.Series("p", list(range(1, 11)))
q = pl.Series("q", list(range(10, 0, -1)))

dist = ((p - q) ** 2).sum() ** 0.5
print(dist)

# Or using numpy
print(np.linalg.norm(p.to_numpy() - q.to_numpy()))

28. How to find all the local maxima (peaks) in a numeric Series?

Difficulty Level: L3

Get the positions of peaks (values surrounded by smaller values on both sides) in ser.

Solve:

import polars as pl
ser = pl.Series("data", [2, 10, 3, 4, 9, 10, 2, 7, 3])

# Write your code below

Desired Output:

python
[1, 5, 7]
Show Solution
import polars as pl
import numpy as np
ser = pl.Series("data", [2, 10, 3, 4, 9, 10, 2, 7, 3])

arr = ser.to_numpy()
# A peak: value greater than both neighbors
peaks = np.where((arr[1:-1] > arr[:-2]) & (arr[1:-1] > arr[2:]))[0] + 1
print(peaks.tolist())

29. How to replace missing spaces in a string with the least frequent character?

Difficulty Level: L2

Replace the spaces in my_str with whichever character is the least frequent, excluding spaces.

Solve:

my_str = 'dbc deb abed gade'

# Write your code below

Desired Output:

python
Least frequent char: g
dbcgdebgabedggade
Show Solution
import polars as pl
my_str = 'dbc deb abed gade'

ser = pl.Series("chars", list(my_str))
counts = ser.filter(ser != ' ').value_counts().sort("count")
least_freq = counts[0, 0]
print(my_str.replace(' ', least_freq))

30. How to create a TimeSeries starting ‘2000-01-01’ and 10 weekends (Saturdays)?

Difficulty Level: L2

Create a Polars DataFrame with 10 Saturday dates starting from 2000-01-01 and random integer values.

Solve:

import polars as pl
import numpy as np

# Write your code below

Desired Output:

python
shape: (10, 2)
┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i32   │
╞════════════╪═══════╡
│ 2000-01-01 ┆ 7     │
│ 2000-01-08 ┆ 4     │
│ 2000-01-15 ┆ 8     │
│ 2000-01-22 ┆ 5     │
│ 2000-01-29 ┆ 7     │
│ 2000-02-05 ┆ 3     │
│ 2000-02-12 ┆ 7     │
│ 2000-02-19 ┆ 8     │
│ 2000-02-26 ┆ 5     │
│ 2000-03-04 ┆ 4     │
└────────────┴───────┘
Show Solution
import polars as pl
import numpy as np
from datetime import date, timedelta

# Find first Saturday on or after 2000-01-01
start = date(2000, 1, 1)
while start.weekday() != 5:  # 5 = Saturday
    start += timedelta(days=1)

saturdays = [start + timedelta(weeks=i) for i in range(10)]
df = pl.DataFrame({
    "date": saturdays,
    "value": np.random.randint(1, 10, 10).tolist()
})
print(df)

31. How to fill missing dates and forward-fill values?

Difficulty Level: L2

ser has missing dates. Fill in the missing dates and forward-fill the corresponding values.

Solve:

import polars as pl
from datetime import date
df = pl.DataFrame({
    "date": [date(2000,1,1), date(2000,1,3), date(2000,1,6), date(2000,1,8)],
    "value": [1, 10, 3, None]
})

# Write your code below

Desired Output:

python
shape: (8, 2)
┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2000-01-01 ┆ 1     │
│ 2000-01-02 ┆ 1     │
│ 2000-01-03 ┆ 10    │
│ 2000-01-04 ┆ 10    │
│ 2000-01-05 ┆ 10    │
│ 2000-01-06 ┆ 3     │
│ 2000-01-07 ┆ 3     │
│ 2000-01-08 ┆ 3     │
└────────────┴───────┘
Show Solution
import polars as pl
from datetime import date

df = pl.DataFrame({
    "date": [date(2000,1,1), date(2000,1,3), date(2000,1,6), date(2000,1,8)],
    "value": [1, 10, 3, None]
})

all_dates = pl.DataFrame({
    "date": pl.date_range(date(2000,1,1), date(2000,1,8), eager=True)
})
result = all_dates.join(df, on="date", how="left").with_columns(
    pl.col("value").forward_fill()
)
print(result)

32. How to find the autocorrelation of a numeric Series?

Difficulty Level: L3

Compute autocorrelation for lags 1 through 10 of ser, and find the lag with the highest correlation.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", (np.arange(20) + np.random.normal(1, 10, 20)).tolist())

# Write your code below

Desired Output:

python
[-0.04, -0.36, 0.24, -0.23, -0.06, 0.1, -0.59, -0.13, 0.33, -0.03]
Lag with highest correlation: 7
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", (np.arange(20) + np.random.normal(1, 10, 20)).tolist())

arr = ser.to_numpy()
autocorrs = [np.corrcoef(arr[:-i], arr[i:])[0, 1] for i in range(1, 11)]
print([round(a, 2) for a in autocorrs])
print('Lag with highest correlation:', np.argmax(np.abs(autocorrs)) + 1)

33. How to import only specified columns from a CSV file?

Difficulty Level: L2

Import ‘crim’ and ‘medv’ columns from the BostonHousing dataset CSV.

Solve:

import polars as pl
url = 'https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv'

# Write your code below

Desired Output:

python
shape: (5, 2)
┌─────────┬──────┐
│ crim    ┆ medv │
│ ---     ┆ ---  │
│ f64     ┆ f64  │
╞═════════╪══════╡
│ 0.00632 ┆ 24.0 │
│ 0.02731 ┆ 21.6 │
│ 0.02729 ┆ 34.7 │
│ 0.03237 ┆ 33.4 │
│ 0.06905 ┆ 36.2 │
└─────────┴──────┘
Show Solution
import polars as pl

df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', columns=["crim", "medv"])
print(df.head())

34. How to get the nrows, ncolumns, datatype, summary stats of each column of a DataFrame?

Difficulty Level: L2

Get the number of rows, columns, datatypes, and summary stats of the Cars93 DataFrame.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
Shape: (93, 27)

Column dtypes:
  Manufacturer: String
  Model: String
  Type: String
  Min.Price: Float64
  Price: Float64
  ...
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

print("Shape:", df.shape)
print("\nDtypes:\n", df.dtypes)
print("\nDescribe:\n", df.describe())

35. How to extract the row and column number of a particular cell with given criterion?

Difficulty Level: L1

Which manufacturer, model, and type has the highest Price? What is the row and column number of the cell with the highest Price value?

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
Row: 58
Column: 4
Mercedes-Benz 300E Midsize Price: 61.9
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Row with highest Price
row_idx = df["Price"].arg_max()
print("Row:", row_idx)
print("Column:", df.columns.index("Price"))
print(df[row_idx])

36. How to rename a specific column in a DataFrame?

Difficulty Level: L2

Rename the column Type to CarType in df.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
['Manufacturer', 'Model', 'CarType', 'Min.Price', 'Price']
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

df = df.rename({"Type": "CarType"})
print(df.columns[:5])

37. How to check if a DataFrame has any missing values?

Difficulty Level: L1

Check if df has any missing values.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
True
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

has_nulls = df.null_count().sum_horizontal()[0] > 0
print(has_nulls)

38. How to count the number of missing values in each column?

Difficulty Level: L1

Count the number of missing values in each column of df.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
Manufacturer: 4
  Price: 2
  Type: 3
  Min.Price: 7
  Max.Price: 5
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

print(df.null_count())

39. How to replace missing values of multiple numeric columns with the mean?

Difficulty Level: L2

Replace NaNs/nulls with the column mean for all numeric columns in df.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
Numeric nulls before: 99, after: 0
Show Solution
import polars as pl
import polars.selectors as cs
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

df = df.with_columns(cs.numeric().fill_null(cs.numeric().mean()))
print(df.null_count())

40. How to use apply function on existing columns with global variables as additional arguments?

Difficulty Level: L2

In df, use polars expressions to compute a new column 'avg' that is the row-mean of columns 'a', 'b', and 'c', then add a column 'avg_mf' = avg × d (where d is an external variable).

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 10, 15).reshape(5, 3).tolist(), schema=['a', 'b', 'c'])
d = 5

# Write your code below

Desired Output:

python
shape: (5, 5)
┌─────┬─────┬─────┬──────────┬───────────┐
│ a   ┆ b   ┆ c   ┆ avg      ┆ avg_mf    │
│ --- ┆ --- ┆ --- ┆ ---      ┆ ---       │
│ i32 ┆ i32 ┆ i32 ┆ f64      ┆ f64       │
╞═════╪═════╪═════╪══════════╪═══════════╡
│ 7   ┆ 4   ┆ 8   ┆ 6.333333 ┆ 31.666667 │
│ 5   ┆ 7   ┆ 3   ┆ 5.0      ┆ 25.0      │
│ 7   ┆ 8   ┆ 5   ┆ 6.666667 ┆ 33.333333 │
│ 4   ┆ 8   ┆ 8   ┆ 6.666667 ┆ 33.333333 │
│ 3   ┆ 6   ┆ 5   ┆ 4.666667 ┆ 23.333333 │
└─────┴─────┴─────┴──────────┴───────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 10, 15).reshape(5, 3).tolist(), schema=['a', 'b', 'c'])
d = 5

df = df.with_columns(
    ((pl.col("a") + pl.col("b") + pl.col("c")) / 3).alias("avg")
).with_columns(
    (pl.col("avg") * d).alias("avg_mf")
)
print(df)

41. How to swap two columns in a DataFrame?

Difficulty Level: L2

In df, swap columns 'a' and 'c'.

Solve:

import polars as pl
import numpy as np
df = pl.DataFrame(np.arange(20).reshape(-1, 5).tolist(), schema=list('abcde'))

# Write your code below

Desired Output:

python
shape: (4, 5)
┌─────┬─────┬─────┬─────┬─────┐
│ c   ┆ b   ┆ a   ┆ d   ┆ e   │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╪═════╪═════╡
│ 2   ┆ 1   ┆ 0   ┆ 3   ┆ 4   │
│ 7   ┆ 6   ┆ 5   ┆ 8   ┆ 9   │
│ 12  ┆ 11  ┆ 10  ┆ 13  ┆ 14  │
│ 17  ┆ 16  ┆ 15  ┆ 18  ┆ 19  │
└─────┴─────┴─────┴─────┴─────┘
Show Solution
import polars as pl
import numpy as np
df = pl.DataFrame(np.arange(20).reshape(-1, 5).tolist(), schema=list('abcde'))

# Swap 'a' and 'c'
cols = df.columns
a_idx, c_idx = cols.index('a'), cols.index('c')
cols[a_idx], cols[c_idx] = cols[c_idx], cols[a_idx]
df = df.select(cols)
print(df)

42. How to sort columns in reverse alphabetical order?

Difficulty Level: L2

Sort the columns of df in reverse alphabetical order.

Solve:

import polars as pl
import numpy as np
df = pl.DataFrame(np.arange(20).reshape(-1, 5).tolist(), schema=list('abcde'))

# Write your code below

Desired Output:

python
shape: (4, 5)
┌─────┬─────┬─────┬─────┬─────┐
│ e   ┆ d   ┆ c   ┆ b   ┆ a   │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╪═════╪═════╡
│ 4   ┆ 3   ┆ 2   ┆ 1   ┆ 0   │
│ 9   ┆ 8   ┆ 7   ┆ 6   ┆ 5   │
│ 14  ┆ 13  ┆ 12  ┆ 11  ┆ 10  │
│ 19  ┆ 18  ┆ 17  ┆ 16  ┆ 15  │
└─────┴─────┴─────┴─────┴─────┘
Show Solution
import polars as pl
import numpy as np
df = pl.DataFrame(np.arange(20).reshape(-1, 5).tolist(), schema=list('abcde'))

df = df.select(sorted(df.columns, reverse=True))
print(df)

43. How to format or suppress scientific notations in a Polars DataFrame?

Difficulty Level: L2

When displaying a DataFrame with very small numbers, format them as fixed-point with 4 decimal places.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame((np.random.random([5, 3]) / 1e3).tolist(), schema=['a', 'b', 'c'])

# Write your code below

Desired Output:

python
shape: (5, 3)
┌────────┬────────┬────────┐
│ a      ┆ b      ┆ c      │
│ ---    ┆ ---    ┆ ---    │
│ f64    ┆ f64    ┆ f64    │
╞════════╪════════╪════════╡
│ 0.0004 ┆ 0.0010 ┆ 0.0007 │
│ 0.0006 ┆ 0.0002 ┆ 0.0002 │
│ 0.0001 ┆ 0.0009 ┆ 0.0006 │
│ 0.0007 ┆ 0.0000 ┆ 0.0010 │
│ 0.0008 ┆ 0.0002 ┆ 0.0002 │
└────────┴────────┴────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame((np.random.random([5, 3]) / 1e3).tolist(), schema=['a', 'b', 'c'])

# Polars uses Config for display settings
with pl.Config(float_precision=4):
    print(df)

44. How to format all values in a DataFrame to show only 4 decimal places?

Difficulty Level: L2

Show all float values in df rounded to 4 decimal places.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.random([5, 3]).tolist(), schema=['a', 'b', 'c'])

# Write your code below

Desired Output:

python
shape: (5, 3)
┌────────┬────────┬────────┐
│ a      ┆ b      ┆ c      │
│ ---    ┆ ---    ┆ ---    │
│ f64    ┆ f64    ┆ f64    │
╞════════╪════════╪════════╡
│ 0.3745 ┆ 0.9507 ┆ 0.732  │
│ 0.5987 ┆ 0.156  ┆ 0.156  │
│ 0.0581 ┆ 0.8662 ┆ 0.6011 │
│ 0.7081 ┆ 0.0206 ┆ 0.9699 │
│ 0.8324 ┆ 0.2123 ┆ 0.1818 │
└────────┴────────┴────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.random([5, 3]).tolist(), schema=['a', 'b', 'c'])

df = df.with_columns(pl.all().round(4))
print(df)

45. How to filter rows of a DataFrame by row number?

Difficulty Level: L1

Select every 20th row starting from the 1st row (row 0).

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (5, 3)
┌──────────────┬─────────┬─────────┐
│ Manufacturer ┆ Model   ┆ Type    │
│ ---          ┆ ---     ┆ ---     │
│ str          ┆ str     ┆ str     │
╞══════════════╪═════════╪═════════╡
│ Acura        ┆ Integra ┆ Small   │
│ Chrysler     ┆ LeBaron ┆ Compact │
│ Honda        ┆ Prelude ┆ Sporty  │
│ Mercury      ┆ Cougar  ┆ Midsize │
│ Subaru       ┆ Loyale  ┆ Small   │
└──────────────┴─────────┴─────────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

result = df.gather_every(20)
print(result.select(["Manufacturer", "Model", "Type"]))

46. How to create a primary key index by combining relevant columns?

Difficulty Level: L2

In df, replace nulls with 'missing' in columns 'Manufacturer', 'Model', and 'Type', then create a new column 'primary_key' as a combination of these three columns. Check if it is unique.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
True
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

df = df.with_columns(
    pl.col("Manufacturer").fill_null("missing"),
    pl.col("Model").fill_null("missing"),
    pl.col("Type").fill_null("missing"),
).with_columns(
    (pl.col("Manufacturer") + "_" + pl.col("Model") + "_" + pl.col("Type")).alias("primary_key")
)
print(df["primary_key"].is_unique().all())

47. How to get the row number of the n-th largest value in a column?

Difficulty Level: L2

Find the row position of the 5th largest value of column 'a' in df.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 30, 30).reshape(10, -1).tolist(), schema=list('abc'))

# Write your code below

Desired Output:

python
DataFrame:
shape: (10, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 7   ┆ 20  ┆ 29  │
│ 15  ┆ 11  ┆ 8   │
│ 29  ┆ 21  ┆ 7   │
│ 26  ┆ 19  ┆ 23  │
│ 11  ┆ 11  ┆ 24  │
│ 21  ┆ 4   ┆ 8   │
│ 24  ┆ 3   ┆ 22  │
│ 21  ┆ 2   ┆ 24  │
│ 12  ┆ 6   ┆ 2   │
│ 28  ┆ 21  ┆ 1   │
└─────┴─────┴─────┘

Row index of 5th largest value in 'a': 5
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 30, 30).reshape(10, -1).tolist(), schema=list('abc'))

n = 5
row_idx = df.with_row_index("idx").sort("a", descending=True)[n - 1, "idx"]
print(row_idx)

48. How to find the position of the n-th largest value greater than the mean?

Difficulty Level: L2

Find the positions of values in ser that are greater than the mean. Report the 2nd position.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.randint(1, 100, 15).tolist())

# Write your code below

Desired Output:

python
Series: [52, 93, 15, 72, 61, 21, 83, 87, 75, 75, 88, 24, 3, 22, 53]
Mean: 55
2nd position where value > mean: 3
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
ser = pl.Series("data", np.random.randint(1, 100, 15).tolist())

mean_val = ser.mean()
positions = [i for i, v in enumerate(ser) if v > mean_val]
print("Mean:", round(mean_val))
print("2nd position where value > mean:", positions[1])

49. How to get the last two rows of a DataFrame whose row sum > 100?

Difficulty Level: L2

Get the last two rows of df where the sum of the row values exceeds 100.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(10, 40, 60).reshape(-1, 4).tolist(), schema=[f"c{i}" for i in range(4)])

# Write your code below

Desired Output:

python
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ c0  ┆ c1  ┆ c2  ┆ c3  │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╪═════╡
│ 24  ┆ 39  ┆ 39  ┆ 24  │
│ 39  ┆ 28  ┆ 21  ┆ 32  │
└─────┴─────┴─────┴─────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(10, 40, 60).reshape(-1, 4).tolist(), schema=[f"c{i}" for i in range(4)])

result = df.filter(
    pl.sum_horizontal(pl.all()) > 100
).tail(2)
print(result)

50. How to find and cap outliers from a Series or DataFrame column?

Difficulty Level: L2

Replace all values in ser that are above the 95th percentile or below the 5th percentile with the respective percentile value.

Solve:

import polars as pl
import numpy as np
np.random.seed(100)
ser = pl.Series("data", np.random.normal(0, 1, 50).tolist())

# Write your code below

Desired Output:

python
Low: -1.6906, High: 1.4707
shape: (5, 1)
┌───────────┐
│ data      │
│ ---       │
│ f64       │
╞═══════════╡
│ -1.690617 │
│ 0.34268   │
│ 1.153036  │
│ -0.252436 │
│ 0.981321  │
└───────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(100)
ser = pl.Series("data", np.random.normal(0, 1, 50).tolist())

low = ser.quantile(0.05)
high = ser.quantile(0.95)

result = ser.to_frame("data").with_columns(
    pl.col("data").clip(low, high)
)
print(result)

51. How to reshape a DataFrame from long to wide format?

Difficulty Level: L3

Pivot df so each unique 'car' becomes a row and the columns are the cities with corresponding 'price' values.

Solve:

import polars as pl
df = pl.DataFrame({
    "car": ["Audi", "Audi", "BMW", "BMW"],
    "city": ["SF", "NYC", "SF", "NYC"],
    "price": [45000, 42000, 55000, 52000]
})

# Write your code below

Desired Output:

python
shape: (2, 3)
┌──────┬───────┬───────┐
│ car  ┆ SF    ┆ NYC   │
│ ---  ┆ ---   ┆ ---   │
│ str  ┆ i64   ┆ i64   │
╞══════╪═══════╪═══════╡
│ Audi ┆ 45000 ┆ 42000 │
│ BMW  ┆ 55000 ┆ 52000 │
└──────┴───────┴───────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    "car": ["Audi", "Audi", "BMW", "BMW"],
    "city": ["SF", "NYC", "SF", "NYC"],
    "price": [45000, 42000, 55000, 52000]
})

result = df.pivot(on="city", index="car", values="price")
print(result)

52. How to reshape a DataFrame from wide to long format?

Difficulty Level: L2

Melt df so each car-city pair becomes a row.

Solve:

import polars as pl
df = pl.DataFrame({
    "car": ["Audi", "BMW"],
    "SF": [45000, 55000],
    "NYC": [42000, 52000]
})

# Write your code below

Desired Output:

python
shape: (4, 3)
┌──────┬──────┬───────┐
│ car  ┆ city ┆ price │
│ ---  ┆ ---  ┆ ---   │
│ str  ┆ str  ┆ i64   │
╞══════╪══════╪═══════╡
│ Audi ┆ SF   ┆ 45000 │
│ BMW  ┆ SF   ┆ 55000 │
│ Audi ┆ NYC  ┆ 42000 │
│ BMW  ┆ NYC  ┆ 52000 │
└──────┴──────┴───────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    "car": ["Audi", "BMW"],
    "SF": [45000, 55000],
    "NYC": [42000, 52000]
})

result = df.unpivot(on=["SF", "NYC"], index="car", variable_name="city", value_name="price")
print(result)

53. How to create a DataFrame with rows as stacked columns?

Difficulty Level: L3

Create a DataFrame where each row is a column name – column value pair for the first row of df.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (10, 2)
┌──────────────┬─────────┐
│ column       ┆ value   │
│ ---          ┆ ---     │
│ str          ┆ str     │
╞══════════════╪═════════╡
│ Manufacturer ┆ Acura   │
│ Model        ┆ Integra │
│ Type         ┆ Small   │
│ Min.Price    ┆ 12.9    │
│ Price        ┆ 15.9    │
│ Max.Price    ┆ 18.8    │
│ MPG.city     ┆ 25      │
│ MPG.highway  ┆ 31      │
│ AirBags      ┆ None    │
│ DriveTrain   ┆ Front   │
└──────────────┴─────────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

first_row = df.head(1)
result = first_row.unpivot(variable_name="column", value_name="value")
print(result)

54. How to check if a DataFrame has any missing values?

Difficulty Level: L1

Check which columns in df have any null values.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
['Manufacturer', 'Model', 'Type', 'Min.Price', 'Price', 'Max.Price', 'MPG.city', 'MPG.highway', 'AirBags', 'DriveTrain', 'Cylinders', 'EngineSize', 'Horsepower', 'RPM', 'Rev.per.mile', 'Man.trans.avail', 'Fuel.tank.capacity', 'Passengers', 'Length', 'Wheelbase', 'Width', 'Turn.circle', 'Rear.seat.room', 'Luggage.room', 'Weight', 'Origin', 'Make']
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

null_counts = df.null_count()
cols_with_nulls = [col for col in df.columns if null_counts[col][0] > 0]
print(cols_with_nulls)

55. How to get the minimum value in each column grouped by another column?

Difficulty Level: L2

In df, for each 'Type', get the minimum 'Price'.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (7, 2)
┌─────────┬───────┐
│ Type    ┆ Price │
│ ---     ┆ ---   │
│ str     ┆ f64   │
╞═════════╪═══════╡
│ null    ┆ 8.6   │
│ Compact ┆ 11.1  │
│ Large   ┆ 18.4  │
│ Midsize ┆ 13.9  │
│ Small   ┆ 7.4   │
│ Sporty  ┆ 12.5  │
│ Van     ┆ 16.3  │
└─────────┴───────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

print(df.group_by("Type").agg(pl.col("Price").min()))

56. How to get the top n rows of each group in a DataFrame?

Difficulty Level: L2

For each 'Type', get the top 2 rows with the highest 'Price'.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (10, 4)
┌─────────┬───────────────┬──────────┬───────┐
│ Type    ┆ Manufacturer  ┆ Model    ┆ Price │
│ ---     ┆ ---           ┆ ---      ┆ ---   │
│ str     ┆ str           ┆ str      ┆ f64   │
╞═════════╪═══════════════╪══════════╪═══════╡
│ null    ┆ Pontiac       ┆ Firebird ┆ 17.7  │
│ null    ┆ Hyundai       ┆ Scoupe   ┆ 10.0  │
│ Compact ┆ Mercedes-Benz ┆ 190E     ┆ 31.9  │
│ Compact ┆ Audi          ┆ 90       ┆ 29.1  │
│ Large   ┆ Lincoln       ┆ Town_Car ┆ 36.1  │
│ Large   ┆ Cadillac      ┆ DeVille  ┆ 34.7  │
│ Midsize ┆ Toyota        ┆ Camry    ┆ null  │
│ Midsize ┆ Mercedes-Benz ┆ 300E     ┆ 61.9  │
│ Small   ┆ Saturn        ┆ SL       ┆ null  │
│ Small   ┆ Acura         ┆ Integra  ┆ 15.9  │
└─────────┴───────────────┴──────────┴───────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

result = df.sort("Price", descending=True).group_by("Type").head(2)
print(result.select(["Type", "Manufacturer", "Model", "Price"]))

57. How to replace missing values with the mode of a column?

Difficulty Level: L2

Replace the missing values in 'DriveTrain' column with its mode (most frequent value).

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
Mode: Front
Nulls after fill: 0
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

mode_val = df["DriveTrain"].drop_nulls().value_counts().sort("count", descending=True)[0, 0]
df = df.with_columns(pl.col("DriveTrain").fill_null(mode_val))
print(df["DriveTrain"].null_count())

58. How to create a new column from existing columns using a condition?

Difficulty Level: L2

Create a new column 'price_category' that says 'high' if Price > 30 else 'low'.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (10, 3)
┌──────────────┬───────┬────────────────┐
│ Manufacturer ┆ Price ┆ price_category │
│ ---          ┆ ---   ┆ ---            │
│ str          ┆ f64   ┆ str            │
╞══════════════╪═══════╪════════════════╡
│ Acura        ┆ 15.9  ┆ low            │
│ null         ┆ 33.9  ┆ high           │
│ Audi         ┆ 29.1  ┆ low            │
│ Audi         ┆ 37.7  ┆ high           │
│ BMW          ┆ 30.0  ┆ low            │
│ Buick        ┆ 15.7  ┆ low            │
│ Buick        ┆ 20.8  ┆ low            │
│ Buick        ┆ 23.7  ┆ low            │
│ Buick        ┆ 26.3  ┆ low            │
│ Cadillac     ┆ 34.7  ┆ high           │
└──────────────┴───────┴────────────────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

df = df.with_columns(
    pl.when(pl.col("Price") > 30)
    .then(pl.lit("high"))
    .otherwise(pl.lit("low"))
    .alias("price_category")
)
print(df.select(["Manufacturer", "Price", "price_category"]).head(10))

59. How to get the column-wise maximum of two DataFrames?

Difficulty Level: L2

Get the element-wise maximum of two DataFrames df1 and df2.

Solve:

import polars as pl
import numpy as np
np.random.seed(100)
df1 = pl.DataFrame(np.random.randint(1, 25, [5, 3]), schema=list('abc'))
df2 = pl.DataFrame(np.random.randint(1, 25, [5, 3]), schema=list('abc'))

# Write your code below

Desired Output:

python
shape: (5, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 17  ┆ 16  ┆ 8   │
│ 24  ┆ 17  ┆ 17  │
│ 23  ┆ 21  ┆ 13  │
│ 22  ┆ 3   ┆ 14  │
│ 22  ┆ 20  ┆ 18  │
└─────┴─────┴─────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(100)
df1 = pl.DataFrame(np.random.randint(1, 25, [5, 3]), schema=list('abc'))
df2 = pl.DataFrame(np.random.randint(1, 25, [5, 3]), schema=list('abc'))

# Rename df2 columns to avoid collision, then use max_horizontal
df2_renamed = df2.rename({c: f"{c}_2" for c in df2.columns})
combined = pl.concat([df1, df2_renamed], how="horizontal")
result = combined.select(
    [pl.max_horizontal(pl.col(c), pl.col(f"{c}_2")).alias(c) for c in df1.columns]
)
print(result)

60. How to get the correlation between two columns of a DataFrame?

Difficulty Level: L2

Compute the correlation between all numeric columns in df and find the two columns with the highest absolute correlation.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Write your code below

Desired Output:

python
Highest correlation: (c1, c8) = 0.8447
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Compute pairwise correlations using numpy
arr = df.to_numpy().astype(float)
corr_matrix = np.corrcoef(arr.T)
np.fill_diagonal(corr_matrix, 0)
max_idx = np.unravel_index(np.argmax(np.abs(corr_matrix)), corr_matrix.shape)
print(f"Highest correlation: ({df.columns[max_idx[0]]}, {df.columns[max_idx[1]]}) = {corr_matrix[max_idx]:.4f}")

61. How to create a column containing the minimum-by-maximum of each row?

Difficulty Level: L2

Compute the minimum / maximum for every row of df.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Write your code below

Desired Output:

python
shape: (8,)
Series: 'min_by_max' [f64]
[
    0.16129
    0.022727
    0.230769
    0.163043
    0.041096
    0.021739
    0.098765
    0.021505
]
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

result = df.with_columns(
    (pl.min_horizontal(pl.all()) / pl.max_horizontal(pl.all())).alias("min_by_max")
)
print(result["min_by_max"])

62. How to create a column that contains the penultimate (second largest) value in each row?

Difficulty Level: L2

Create a new column 'penultimate' which has the second largest value of each row.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Write your code below

Desired Output:

python
shape: (8, 1)
┌─────────────┐
│ penultimate │
│ ---         │
│ i32         │
╞═════════════╡
│ 87          │
│ 88          │
│ 89          │
│ 80          │
│ 64          │
│ 90          │
│ 78          │
│ 90          │
└─────────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Use numpy on each row
arr = df.to_numpy()
penultimate = [sorted(row)[-2] for row in arr]
df = df.with_columns(pl.Series("penultimate", penultimate))
print(df.select(["penultimate"]))

63. How to normalize all columns in a DataFrame?

Difficulty Level: L2

Normalize all columns of df so that the values in each column range from 0 to 1. (min-max scaling)

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Write your code below

Desired Output:

python
shape: (5, 10)
┌──────────┬──────────┬──────────┬──────────┬───┬──────────┬──────────┬──────────┬──────────┐
│ c0       ┆ c1       ┆ c2       ┆ c3       ┆ … ┆ c6       ┆ c7       ┆ c8       ┆ c9       │
│ ---      ┆ ---      ┆ ---      ┆ ---      ┆   ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ f64      ┆ f64      ┆ f64      ┆ f64      ┆   ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞══════════╪══════════╪══════════╪══════════╪═══╪══════════╪══════════╪══════════╪══════════╡
│ 0.571429 ┆ 1.0      ┆ 0.134831 ┆ 1.0      ┆ … ┆ 0.861111 ┆ 0.977011 ┆ 0.863636 ┆ 0.811111 │
│ 1.0      ┆ 0.241758 ┆ 0.0      ┆ 0.275362 ┆ … ┆ 0.930556 ┆ 0.321839 ┆ 0.30303  ┆ 0.0      │
│ 0.714286 ┆ 0.637363 ┆ 0.202247 ┆ 0.434783 ┆ … ┆ 0.013889 ┆ 1.0      ┆ 0.469697 ┆ 0.988889 │
│ 0.654762 ┆ 0.43956  ┆ 1.0      ┆ 0.826087 ┆ … ┆ 0.569444 ┆ 0.689655 ┆ 0.439394 ┆ 0.666667 │
│ 0.559524 ┆ 0.582418 ┆ 0.685393 ┆ 0.0      ┆ … ┆ 0.0      ┆ 0.816092 ┆ 0.318182 ┆ 0.177778 │
└──────────┴──────────┴──────────┴──────────┴───┴──────────┴──────────┴──────────┴──────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

result = df.with_columns(
    [(pl.col(c) - pl.col(c).min()) / (pl.col(c).max() - pl.col(c).min()) for c in df.columns]
)
print(result)

64. How to compute the row-wise softmax of a DataFrame?

Difficulty Level: L3

Compute the softmax of each row: e^x_i / sum(e^x) for each row.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Write your code below

Desired Output:

python
shape: (3, 10)
┌────────┬────────┬────────┬────────┬───┬────────┬────────┬────────┬────────┐
│ c0     ┆ c1     ┆ c2     ┆ c3     ┆ … ┆ c6     ┆ c7     ┆ c8     ┆ c9     │
│ ---    ┆ ---    ┆ ---    ┆ ---    ┆   ┆ ---    ┆ ---    ┆ ---    ┆ ---    │
│ f64    ┆ f64    ┆ f64    ┆ f64    ┆   ┆ f64    ┆ f64    ┆ f64    ┆ f64    │
╞════════╪════════╪════════╪════════╪═══╪════════╪════════╪════════╪════════╡
│ 0.0000 ┆ 0.9975 ┆ 0.0000 ┆ 0.0000 ┆ … ┆ 0.0000 ┆ 0.0025 ┆ 0.0000 ┆ 0.0000 │
│ 0.5000 ┆ 0.0000 ┆ 0.0000 ┆ 0.0000 ┆ … ┆ 0.5000 ┆ 0.0000 ┆ 0.0000 ┆ 0.0000 │
│ 0.0000 ┆ 0.0000 ┆ 0.0000 ┆ 0.0000 ┆ … ┆ 0.0000 ┆ 0.1192 ┆ 0.0000 ┆ 0.8808 │
└────────┴────────┴────────┴────────┴───┴────────┴────────┴────────┴────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

arr = df.to_numpy().astype(float)
exp_arr = np.exp(arr - arr.max(axis=1, keepdims=True))  # for numerical stability
softmax = exp_arr / exp_arr.sum(axis=1, keepdims=True)
result = pl.DataFrame(softmax.tolist(), schema=df.columns)
print(result)

65. How to find the maximum range (max – min) column in a DataFrame?

Difficulty Level: L2

Find the column with the maximum range (max – min) in df.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Write your code below

Desired Output:

python
Ranges: {'c1': 91, 'c9': 90, 'c2': 89}
Column with max range: c1
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 80).reshape(8, -1).tolist(), schema=[f'c{i}' for i in range(10)])

ranges = {c: df[c].max() - df[c].min() for c in df.columns}
print("Column with max range:", max(ranges, key=ranges.get))

66. How to replace both diagonals of a DataFrame with 0?

Difficulty Level: L3

Replace both the main and anti-diagonal of df with 0.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 100).reshape(10, -1).tolist(), schema=[f'c{i}' for i in range(10)])

# Write your code below

Desired Output:

python
shape: (5, 10)
┌─────┬─────┬─────┬─────┬───┬─────┬─────┬─────┬─────┐
│ c0  ┆ c1  ┆ c2  ┆ c3  ┆ … ┆ c6  ┆ c7  ┆ c8  ┆ c9  │
│ --- ┆ --- ┆ --- ┆ --- ┆   ┆ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 ┆ i32 ┆   ┆ i32 ┆ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╪═════╪═══╪═════╪═════╪═════╪═════╡
│ 0   ┆ 93  ┆ 15  ┆ 72  ┆ … ┆ 83  ┆ 87  ┆ 75  ┆ 0   │
│ 88  ┆ 0   ┆ 3   ┆ 22  ┆ … ┆ 88  ┆ 30  ┆ 0   ┆ 2   │
│ 64  ┆ 60  ┆ 0   ┆ 33  ┆ … ┆ 22  ┆ 0   ┆ 49  ┆ 91  │
│ 59  ┆ 42  ┆ 92  ┆ 0   ┆ … ┆ 0   ┆ 62  ┆ 47  ┆ 62  │
│ 51  ┆ 55  ┆ 64  ┆ 3   ┆ … ┆ 21  ┆ 73  ┆ 39  ┆ 18  │
└─────┴─────┴─────┴─────┴───┴─────┴─────┴─────┴─────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 100, 100).reshape(10, -1).tolist(), schema=[f'c{i}' for i in range(10)])

arr = df.to_numpy().copy()
np.fill_diagonal(arr, 0)
np.fill_diagonal(np.fliplr(arr), 0)
result = pl.DataFrame(arr.tolist(), schema=df.columns)
print(result)

67. How to get a particular group of a group_by DataFrame by key?

Difficulty Level: L2

From df grouped by 'col1', get the group belonging to 'apple' as a DataFrame.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'col1': ['apple', 'banana', 'orange'] * 3,
    'col2': np.random.rand(9).tolist(),
    'col3': np.random.randint(0, 15, 9).tolist()
})

# Write your code below

Desired Output:

python
shape: (3, 3)
┌───────┬──────────┬──────┐
│ col1  ┆ col2     ┆ col3 │
│ ---   ┆ ---      ┆ ---  │
│ str   ┆ f64      ┆ i32  │
╞═══════╪══════════╪══════╡
│ apple ┆ 0.37454  ┆ 7    │
│ apple ┆ 0.598658 ┆ 4    │
│ apple ┆ 0.058084 ┆ 11   │
└───────┴──────────┴──────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'col1': ['apple', 'banana', 'orange'] * 3,
    'col2': np.random.rand(9).tolist(),
    'col3': np.random.randint(0, 15, 9).tolist()
})

result = df.filter(pl.col("col1") == "apple")
print(result)

68. How to get the n-th largest value of a column when grouped by another column?

Difficulty Level: L2

In df, find the second largest value of 'taste' for 'banana'.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange', 'apple', 'banana', 'orange', 'apple', 'banana', 'orange'],
    'taste': np.random.rand(9).tolist(),
    'price': np.random.randint(1, 15, 9).tolist()
})

# Write your code below

Desired Output:

python
2nd largest taste for banana: 0.8662
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange', 'apple', 'banana', 'orange', 'apple', 'banana', 'orange'],
    'taste': np.random.rand(9).tolist(),
    'price': np.random.randint(1, 15, 9).tolist()
})

result = (
    df.filter(pl.col("fruit") == "banana")
    .sort("taste", descending=True)
    [1, "taste"]  # 2nd largest (index 1)
)
print(result)

69. How to compute grouped mean and keep the grouped column as another column (not index)?

Difficulty Level: L1

Compute the grouped mean of 'price' by 'fruit' and keep 'fruit' as a regular column.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange'] * 3,
    'taste': np.random.rand(9).tolist(),
    'price': np.random.randint(1, 15, 9).tolist()
})

# Write your code below

Desired Output:

python
shape: (3, 2)
┌────────┬──────────┐
│ fruit  ┆ price    │
│ ---    ┆ ---      │
│ str    ┆ f64      │
╞════════╪══════════╡
│ apple  ┆ 8.333333 │
│ banana ┆ 6.333333 │
│ orange ┆ 6.666667 │
└────────┴──────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange'] * 3,
    'taste': np.random.rand(9).tolist(),
    'price': np.random.randint(1, 15, 9).tolist()
})

# In Polars, group_by always keeps the grouped column as a column (no index)
result = df.group_by("fruit").agg(pl.col("price").mean())
print(result)

70. How to join two DataFrames by 2 columns so they have only the common rows?

Difficulty Level: L2

Join df1 and df2 on 'fruit' and 'weight' so only matching rows remain.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df1 = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange'],
    'weight': ['high', 'medium', 'low'],
    'price': np.random.randint(0, 15, 3).tolist()
})
df2 = pl.DataFrame({
    'fruit': ['apple', 'banana', 'melon'],
    'weight': ['high', 'medium', 'high'],
    'taste': np.random.randint(0, 15, 3).tolist()
})

# Write your code below

Desired Output:

python
shape: (2, 4)
┌────────┬────────┬───────┬───────┐
│ fruit  ┆ weight ┆ price ┆ taste │
│ ---    ┆ ---    ┆ ---   ┆ ---   │
│ str    ┆ str    ┆ i32   ┆ i32   │
╞════════╪════════╪═══════╪═══════╡
│ apple  ┆ high   ┆ 6     ┆ 14    │
│ banana ┆ medium ┆ 3     ┆ 10    │
└────────┴────────┴───────┴───────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df1 = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange'],
    'weight': ['high', 'medium', 'low'],
    'price': np.random.randint(0, 15, 3).tolist()
})
df2 = pl.DataFrame({
    'fruit': ['apple', 'banana', 'melon'],
    'weight': ['high', 'medium', 'high'],
    'taste': np.random.randint(0, 15, 3).tolist()
})

result = df1.join(df2, on=["fruit", "weight"], how="inner")
print(result)

71. How to remove rows from a DataFrame that are present in another DataFrame?

Difficulty Level: L3

Remove rows from df1 that are present in df2, based on the 'fruit' column.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df1 = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange'],
    'weight': ['high', 'medium', 'low'],
    'price': np.random.randint(0, 15, 3).tolist()
})
df2 = pl.DataFrame({
    'fruit': ['apple', 'melon', 'banana'],
    'weight': ['high', 'high', 'low'],
    'taste': np.random.randint(0, 15, 3).tolist()
})

# Write your code below

Desired Output:

python
shape: (1, 3)
┌────────┬────────┬───────┐
│ fruit  ┆ weight ┆ price │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ str    ┆ i32   │
╞════════╪════════╪═══════╡
│ orange ┆ low    ┆ 12    │
└────────┴────────┴───────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df1 = pl.DataFrame({
    'fruit': ['apple', 'banana', 'orange'],
    'weight': ['high', 'medium', 'low'],
    'price': np.random.randint(0, 15, 3).tolist()
})
df2 = pl.DataFrame({
    'fruit': ['apple', 'melon', 'banana'],
    'weight': ['high', 'high', 'low'],
    'taste': np.random.randint(0, 15, 3).tolist()
})

result = df1.filter(~pl.col("fruit").is_in(df2["fruit"]))
print(result)

72. How to get the positions where values of two columns match?

Difficulty Level: L1

Get the row positions where the values of columns 'a' and 'b' are equal.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'a': np.random.choice([1, 2, 3, 4], 10).tolist(),
    'b': np.random.choice([1, 2, 3, 4], 10).tolist()
})

# Write your code below

Desired Output:

python
shape: (10, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 3   ┆ 3   │
│ 4   ┆ 3   │
│ 1   ┆ 3   │
│ 3   ┆ 3   │
│ 3   ┆ 4   │
│ 4   ┆ 1   │
│ 1   ┆ 4   │
│ 1   ┆ 4   │
│ 3   ┆ 4   │
│ 2   ┆ 3   │
└─────┴─────┘
Positions where a == b: [0, 3]
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'a': np.random.choice([1, 2, 3, 4], 10).tolist(),
    'b': np.random.choice([1, 2, 3, 4], 10).tolist()
})

positions = df.with_row_index("idx").filter(pl.col("a") == pl.col("b"))["idx"].to_list()
print(positions)

73. How to create lags and leads of a column in a DataFrame?

Difficulty Level: L2

Create columns for lag1 (shifted down by 1) and lead1 (shifted up by 1) of column 'a'.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'a': np.arange(1, 11).tolist(),
    'b': np.random.randint(10, 30, 10).tolist()
})

# Write your code below

Desired Output:

python
shape: (10, 4)
┌─────┬─────┬──────┬───────┐
│ a   ┆ b   ┆ lag1 ┆ lead1 │
│ --- ┆ --- ┆ ---  ┆ ---   │
│ i32 ┆ i32 ┆ i32  ┆ i32   │
╞═════╪═════╪══════╪═══════╡
│ 1   ┆ 16  ┆ null ┆ 2     │
│ 2   ┆ 29  ┆ 1    ┆ 3     │
│ 3   ┆ 24  ┆ 2    ┆ 4     │
│ 4   ┆ 20  ┆ 3    ┆ 5     │
│ 5   ┆ 17  ┆ 4    ┆ 6     │
│ 6   ┆ 16  ┆ 5    ┆ 7     │
│ 7   ┆ 28  ┆ 6    ┆ 8     │
│ 8   ┆ 20  ┆ 7    ┆ 9     │
│ 9   ┆ 20  ┆ 8    ┆ 10    │
│ 10  ┆ 13  ┆ 9    ┆ null  │
└─────┴─────┴──────┴───────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'a': np.arange(1, 11).tolist(),
    'b': np.random.randint(10, 30, 10).tolist()
})

df = df.with_columns(
    pl.col("a").shift(1).alias("lag1"),
    pl.col("a").shift(-1).alias("lead1"),
)
print(df)

74. How to get the frequency of unique values in the entire DataFrame?

Difficulty Level: L2

Get the frequency of unique values across the entire DataFrame df.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 10, 20).reshape(4, 5).tolist(), schema=list('abcde'))

# Write your code below

Desired Output:

python
shape: (7, 2)
┌─────┬───────┐
│ a   ┆ count │
│ --- ┆ ---   │
│ i64 ┆ u32   │
╞═════╪═══════╡
│ 8   ┆ 5     │
│ 5   ┆ 4     │
│ 7   ┆ 3     │
│ 4   ┆ 2     │
│ 6   ┆ 2     │
│ 2   ┆ 2     │
│ 3   ┆ 2     │
└─────┴───────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(1, 10, 20).reshape(4, 5).tolist(), schema=list('abcde'))

# Flatten all values and count
all_values = pl.concat([df[c].cast(pl.Int64) for c in df.columns])
print(all_values.value_counts().sort("count", descending=True))

75. How to split a text column into two separate columns?

Difficulty Level: L2

Split the string column in df to form a DataFrame with 3 columns.

Solve:

import polars as pl
df = pl.DataFrame({
    "row": [
        "STD, City\tState",
        "33, Kolkata\tWest Bengal",
        "44, Chennai\tTamil Nadu",
        "40, Hyderabad\tTelengana",
        "80, Bangalore\tKarnataka"
    ]
})

# Write your code below

Desired Output:

python
shape: (4, 3)
┌─────┬───────────┬─────────────┐
│ STD ┆ City      ┆ State       │
│ --- ┆ ---       ┆ ---         │
│ str ┆ str       ┆ str         │
╞═════╪═══════════╪═════════════╡
│ 33  ┆ Kolkata   ┆ West Bengal │
│ 44  ┆ Chennai   ┆ Tamil Nadu  │
│ 40  ┆ Hyderabad ┆ Telengana   │
│ 80  ┆ Bangalore ┆ Karnataka   │
└─────┴───────────┴─────────────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    "row": [
        "STD, City\tState",
        "33, Kolkata\tWest Bengal",
        "44, Chennai\tTamil Nadu",
        "40, Hyderabad\tTelengana",
        "80, Bangalore\tKarnataka"
    ]
})

# Split by comma+space to get STD and rest
split1 = df["row"].str.split(", ")
df2 = pl.DataFrame({
    "first": split1.list.get(0),
    "rest": split1.list.get(1),
})

# Split rest by tab
split2 = df2["rest"].str.split("\t")
result = pl.DataFrame({
    "STD": df2["first"],
    "City": split2.list.get(0),
    "State": split2.list.get(1),
})

# Use first row as header and skip it
header = result.row(0)
result = result.slice(1).rename(dict(zip(result.columns, header)))
print(result)

76. How to rank items within each group?

Difficulty Level: L2

For each store, rank the months by revenue (highest = rank 1). Use a window function.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'store': ['A','A','A','B','B','B','C','C','C'],
    'month': ['Jan','Feb','Mar','Jan','Feb','Mar','Jan','Feb','Mar'],
    'revenue': np.random.randint(100, 500, 9).tolist()
})

# Write your code below

Desired Output:

python
shape: (9, 4)
┌───────┬───────┬─────────┬───────────────┐
│ store ┆ month ┆ revenue ┆ rank_in_store │
│ ---   ┆ ---   ┆ ---     ┆ ---           │
│ str   ┆ str   ┆ i32     ┆ i32           │
╞═══════╪═══════╪═════════╪═══════════════╡
│ A     ┆ Jan   ┆ 202     ┆ 3             │
│ A     ┆ Feb   ┆ 448     ┆ 1             │
│ A     ┆ Mar   ┆ 370     ┆ 2             │
│ B     ┆ Jan   ┆ 206     ┆ 2             │
│ B     ┆ Feb   ┆ 171     ┆ 3             │
│ B     ┆ Mar   ┆ 288     ┆ 1             │
│ C     ┆ Jan   ┆ 120     ┆ 3             │
│ C     ┆ Feb   ┆ 202     ┆ 2             │
│ C     ┆ Mar   ┆ 221     ┆ 1             │
└───────┴───────┴─────────┴───────────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'store': ['A','A','A','B','B','B','C','C','C'],
    'month': ['Jan','Feb','Mar','Jan','Feb','Mar','Jan','Feb','Mar'],
    'revenue': np.random.randint(100, 500, 9).tolist()
})

result = df.with_columns(
    pl.col("revenue").rank(descending=True).over("store").cast(pl.Int32).alias("rank_in_store")
)
print(result)

77. How to compute the running difference within groups?

Difficulty Level: L2

For each user, compute the day-over-day change in logins using diff() within groups.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'user': ['A','A','A','A','B','B','B','B'],
    'day': [1,2,3,4,1,2,3,4],
    'logins': np.random.randint(1, 20, 8).tolist()
})

# Write your code below

Desired Output:

python
shape: (8, 4)
┌──────┬─────┬────────┬──────────────┐
│ user ┆ day ┆ logins ┆ daily_change │
│ ---  ┆ --- ┆ ---    ┆ ---          │
│ str  ┆ i64 ┆ i32    ┆ i32          │
╞══════╪═════╪════════╪══════════════╡
│ A    ┆ 1   ┆ 7      ┆ null         │
│ A    ┆ 2   ┆ 15     ┆ 8            │
│ A    ┆ 3   ┆ 11     ┆ -4           │
│ A    ┆ 4   ┆ 8      ┆ -3           │
│ B    ┆ 1   ┆ 7      ┆ null         │
│ B    ┆ 2   ┆ 19     ┆ 12           │
│ B    ┆ 3   ┆ 11     ┆ -8           │
│ B    ┆ 4   ┆ 11     ┆ 0            │
└──────┴─────┴────────┴──────────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'user': ['A','A','A','A','B','B','B','B'],
    'day': [1,2,3,4,1,2,3,4],
    'logins': np.random.randint(1, 20, 8).tolist()
})

result = df.with_columns(
    pl.col("logins").diff().over("user").alias("daily_change")
)
print(result)

78. How to compute each employee’s salary as a percentage of their department total?

Difficulty Level: L2

Add a column showing what percentage of the department salary each employee represents.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'dept': ['Sales','Sales','Sales','Eng','Eng','Eng'],
    'employee': ['Alice','Bob','Carol','Dave','Eve','Frank'],
    'salary': (np.random.randint(50, 150, 6) * 1000).tolist()
})

# Write your code below

Desired Output:

python
shape: (6, 4)
┌───────┬──────────┬────────┬─────────────┐
│ dept  ┆ employee ┆ salary ┆ pct_of_dept │
│ ---   ┆ ---      ┆ ---    ┆ ---         │
│ str   ┆ str      ┆ i32    ┆ f64         │
╞═══════╪══════════╪════════╪═════════════╡
│ Sales ┆ Alice    ┆ 101000 ┆ 32.9        │
│ Sales ┆ Bob      ┆ 142000 ┆ 46.3        │
│ Sales ┆ Carol    ┆ 64000  ┆ 20.8        │
│ Eng   ┆ Dave     ┆ 121000 ┆ 40.2        │
│ Eng   ┆ Eve      ┆ 110000 ┆ 36.5        │
│ Eng   ┆ Frank    ┆ 70000  ┆ 23.3        │
└───────┴──────────┴────────┴─────────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'dept': ['Sales','Sales','Sales','Eng','Eng','Eng'],
    'employee': ['Alice','Bob','Carol','Dave','Eve','Frank'],
    'salary': (np.random.randint(50, 150, 6) * 1000).tolist()
})

result = df.with_columns(
    (pl.col("salary") / pl.col("salary").sum().over("dept") * 100).round(1).alias("pct_of_dept")
)
print(result)

79. How to detect the start of a new streak in a sequence?

Difficulty Level: L3

Given a Series of status values, flag each row where a new streak begins (i.e., the value changes from the previous row).

Solve:

import polars as pl
ser = pl.Series("status", ['ok','ok','fail','fail','fail','ok','fail','ok','ok'])

# Write your code below

Desired Output:

python
shape: (9, 3)
┌─────┬────────┬───────────────┐
│ idx ┆ status ┆ is_new_streak │
│ --- ┆ ---    ┆ ---           │
│ u32 ┆ str    ┆ bool          │
╞═════╪════════╪═══════════════╡
│ 0   ┆ ok     ┆ null          │
│ 1   ┆ ok     ┆ false         │
│ 2   ┆ fail   ┆ true          │
│ 3   ┆ fail   ┆ false         │
│ 4   ┆ fail   ┆ false         │
│ 5   ┆ ok     ┆ true          │
│ 6   ┆ fail   ┆ true          │
│ 7   ┆ ok     ┆ true          │
│ 8   ┆ ok     ┆ false         │
└─────┴────────┴───────────────┘
Show Solution
import polars as pl
ser = pl.Series("status", ['ok','ok','fail','fail','fail','ok','fail','ok','ok'])

df = ser.to_frame("status").with_row_index("idx")
result = df.with_columns(
    (pl.col("status") != pl.col("status").shift(1)).alias("is_new_streak")
)
print(result)

80. How to compute the row-wise coefficient of variation?

Difficulty Level: L3

Compute the coefficient of variation (std / mean) across the columns for each row.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(10, 100, 30).reshape(6, 5).tolist(), schema=[f"s{i}" for i in range(5)])

# Write your code below

Desired Output:

python
shape: (6, 1)
┌────────┐
│ cv     │
│ ---    │
│ f64    │
╞════════╡
│ 0.4706 │
│ 0.0696 │
│ 0.6956 │
│ 0.6162 │
│ 0.3786 │
│ 0.4025 │
└────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame(np.random.randint(10, 100, 30).reshape(6, 5).tolist(), schema=[f"s{i}" for i in range(5)])

means = pl.mean_horizontal(pl.all())
stds = pl.concat_list(pl.all()).list.eval(pl.element().std()).list.first()
result = df.with_columns(
    (stds / means).round(4).alias("cv")
)
print(result.select("cv"))

81. How to build a pivot table with multiple aggregations?

Difficulty Level: L2

Group by region and product, then compute total sales, average sales, and total quantity.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'region': ['East','East','West','West','East','West'] * 2,
    'product': ['A','B','A','B','A','B'] * 2,
    'sales': np.random.randint(100, 1000, 12).tolist(),
    'qty': np.random.randint(1, 50, 12).tolist()
})

# Write your code below

Desired Output:

python
shape: (4, 5)
┌────────┬─────────┬─────────────┬───────────┬───────────┐
│ region ┆ product ┆ total_sales ┆ avg_sales ┆ total_qty │
│ ---    ┆ ---     ┆ ---         ┆ ---       ┆ ---       │
│ str    ┆ str     ┆ i32         ┆ f64       ┆ i32       │
╞════════╪═════════╪═════════════╪═══════════╪═══════════╡
│ East   ┆ A       ┆ 1774        ┆ 444.0     ┆ 98        │
│ East   ┆ B       ┆ 655         ┆ 328.0     ┆ 33        │
│ West   ┆ A       ┆ 1674        ┆ 837.0     ┆ 26        │
│ West   ┆ B       ┆ 1076        ┆ 269.0     ┆ 114       │
└────────┴─────────┴─────────────┴───────────┴───────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'region': ['East','East','West','West','East','West'] * 2,
    'product': ['A','B','A','B','A','B'] * 2,
    'sales': np.random.randint(100, 1000, 12).tolist(),
    'qty': np.random.randint(1, 50, 12).tolist()
})

result = df.group_by(["region", "product"]).agg(
    pl.col("sales").sum().alias("total_sales"),
    pl.col("sales").mean().round(0).alias("avg_sales"),
    pl.col("qty").sum().alias("total_qty")
).sort(["region", "product"])
print(result)

82. How to create a rolling mean column?

Difficulty Level: L2

Create a 5-period rolling mean of column 'medv'.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Write your code below

Desired Output:

python
shape: (7, 2)
┌──────┬──────────────┐
│ medv ┆ rolling_medv │
│ ---  ┆ ---          │
│ f64  ┆ f64          │
╞══════╪══════════════╡
│ 24.0 ┆ null         │
│ 21.6 ┆ null         │
│ 34.7 ┆ null         │
│ 33.4 ┆ null         │
│ 36.2 ┆ 29.98        │
│ 28.7 ┆ 30.92        │
│ 22.9 ┆ 31.18        │
└──────┴──────────────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

df = df.with_columns(
    pl.col("medv").rolling_mean(window_size=5).alias("rolling_medv")
)
print(df.select(["medv", "rolling_medv"]).head(7))

83. How to find the first occurrence of each unique value?

Difficulty Level: L2

For each unique category, find the row index and value of its first appearance.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'category': ['B','A','C','A','B','C','A','B'],
    'value': np.random.randint(10, 99, 8).tolist()
})

# Write your code below

Desired Output:

python
shape: (3, 3)
┌──────────┬───────────────┬─────────────┐
│ category ┆ first_seen_at ┆ first_value │
│ ---      ┆ ---           ┆ ---         │
│ str      ┆ u32           ┆ i32         │
╞══════════╪═══════════════╪═════════════╡
│ B        ┆ 0             ┆ 61          │
│ A        ┆ 1             ┆ 24          │
│ C        ┆ 2             ┆ 81          │
└──────────┴───────────────┴─────────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'category': ['B','A','C','A','B','C','A','B'],
    'value': np.random.randint(10, 99, 8).tolist()
})

result = df.with_row_index("idx").group_by("category").agg(
    pl.col("idx").first().alias("first_seen_at"),
    pl.col("value").first().alias("first_value")
).sort("first_seen_at")
print(result)

84. How to find duplicate rows in a DataFrame?

Difficulty Level: L1

Find duplicate rows based on all columns.

Solve:

import polars as pl
df = pl.DataFrame({
    'a': [1, 2, 2, 3, 3],
    'b': ['x', 'y', 'y', 'z', 'z'],
})

# Write your code below

Desired Output:

python
shape: (4, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 2   ┆ y   │
│ 2   ┆ y   │
│ 3   ┆ z   │
│ 3   ┆ z   │
└─────┴─────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    'a': [1, 2, 2, 3, 3],
    'b': ['x', 'y', 'y', 'z', 'z'],
})

result = df.filter(pl.struct(pl.all()).is_duplicated())
print(result)

85. How to identify the top performer in each group?

Difficulty Level: L2

From df, select the player with the highest score in each team — using a window function, not group_by.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'team': ['Red','Red','Red','Blue','Blue','Blue','Green','Green','Green'],
    'player': ['A','B','C','D','E','F','G','H','I'],
    'score': np.random.randint(50, 100, 9).tolist()
})

# Write your code below

Desired Output:

python
shape: (3, 3)
┌───────┬────────┬───────┐
│ team  ┆ player ┆ score │
│ ---   ┆ ---    ┆ ---   │
│ str   ┆ str    ┆ i32   │
╞═══════╪════════╪═══════╡
│ Red   ┆ A      ┆ 88    │
│ Blue  ┆ D      ┆ 92    │
│ Green ┆ G      ┆ 88    │
└───────┴────────┴───────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'team': ['Red','Red','Red','Blue','Blue','Blue','Green','Green','Green'],
    'player': ['A','B','C','D','E','F','G','H','I'],
    'score': np.random.randint(50, 100, 9).tolist()
})

result = df.filter(
    pl.col("score") == pl.col("score").max().over("team")
)
print(result)

86. How to compute z-scores per group?

Difficulty Level: L2

Compute the z-score of value within each group using window functions.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'group': ['A','A','A','A','B','B','B','B'],
    'value': np.random.normal(50, 10, 8).round(1).tolist()
})

# Write your code below

Desired Output:

python
shape: (8, 3)
┌───────┬───────┬─────────┐
│ group ┆ value ┆ z_score │
│ ---   ┆ ---   ┆ ---     │
│ str   ┆ f64   ┆ f64     │
╞═══════╪═══════╪═════════╡
│ A     ┆ 55.0  ┆ -0.19   │
│ A     ┆ 48.6  ┆ -1.13   │
│ A     ┆ 56.5  ┆ 0.03    │
│ A     ┆ 65.2  ┆ 1.3     │
│ B     ┆ 47.7  ┆ -0.8    │
│ B     ┆ 47.7  ┆ -0.8    │
│ B     ┆ 65.8  ┆ 1.26    │
│ B     ┆ 57.7  ┆ 0.34    │
└───────┴───────┴─────────┘
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
df = pl.DataFrame({
    'group': ['A','A','A','A','B','B','B','B'],
    'value': np.random.normal(50, 10, 8).round(1).tolist()
})

result = df.with_columns(
    ((pl.col("value") - pl.col("value").mean().over("group")) /
     pl.col("value").std().over("group")).round(2).alias("z_score")
)
print(result)

87. How to compute expanding (cumulative) window aggregations?

Difficulty Level: L2

Add columns for cumulative sum, running max, and running min of sales.

Solve:

import polars as pl
df = pl.DataFrame({
    'day': list(range(1, 8)),
    'sales': [100, 150, 130, 200, 180, 220, 210]
})

# Write your code below

Desired Output:

python
shape: (7, 5)
┌─────┬───────┬───────────┬─────────────┬─────────────┐
│ day ┆ sales ┆ cum_sales ┆ running_max ┆ running_min │
│ --- ┆ ---   ┆ ---       ┆ ---         ┆ ---         │
│ i64 ┆ i64   ┆ i64       ┆ i64         ┆ i64         │
╞═════╪═══════╪═══════════╪═════════════╪═════════════╡
│ 1   ┆ 100   ┆ 100       ┆ 100         ┆ 100         │
│ 2   ┆ 150   ┆ 250       ┆ 150         ┆ 100         │
│ 3   ┆ 130   ┆ 380       ┆ 150         ┆ 100         │
│ 4   ┆ 200   ┆ 580       ┆ 200         ┆ 100         │
│ 5   ┆ 180   ┆ 760       ┆ 200         ┆ 100         │
│ 6   ┆ 220   ┆ 980       ┆ 220         ┆ 100         │
│ 7   ┆ 210   ┆ 1190      ┆ 220         ┆ 100         │
└─────┴───────┴───────────┴─────────────┴─────────────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    'day': list(range(1, 8)),
    'sales': [100, 150, 130, 200, 180, 220, 210]
})

result = df.with_columns(
    pl.col("sales").cum_sum().alias("cum_sales"),
    pl.col("sales").cum_max().alias("running_max"),
    pl.col("sales").cum_min().alias("running_min"),
)
print(result)

88. How to compute a conditional cumulative sum?

Difficulty Level: L3

Compute a running total of amount, but only accumulate rows where event == 'purchase'.

Solve:

import polars as pl
df = pl.DataFrame({
    'event': ['login','purchase','login','purchase','login','purchase','login','purchase'],
    'amount': [0, 50, 0, 30, 0, 80, 0, 20]
})

# Write your code below

Desired Output:

python
shape: (8, 3)
┌──────────┬────────┬────────────────────────┐
│ event    ┆ amount ┆ running_purchase_total │
│ ---      ┆ ---    ┆ ---                    │
│ str      ┆ i64    ┆ i64                    │
╞══════════╪════════╪════════════════════════╡
│ login    ┆ 0      ┆ 0                      │
│ purchase ┆ 50     ┆ 50                     │
│ login    ┆ 0      ┆ 50                     │
│ purchase ┆ 30     ┆ 80                     │
│ login    ┆ 0      ┆ 80                     │
│ purchase ┆ 80     ┆ 160                    │
│ login    ┆ 0      ┆ 160                    │
│ purchase ┆ 20     ┆ 180                    │
└──────────┴────────┴────────────────────────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    'event': ['login','purchase','login','purchase','login','purchase','login','purchase'],
    'amount': [0, 50, 0, 30, 0, 80, 0, 20]
})

result = df.with_columns(
    pl.when(pl.col("event") == "purchase")
    .then(pl.col("amount"))
    .otherwise(0)
    .cum_sum()
    .alias("running_purchase_total")
)
print(result)

89. How to compute quarter-over-quarter growth rate within groups?

Difficulty Level: L2

For each company, compute the percentage growth in revenue from the previous quarter.

Solve:

import polars as pl
df = pl.DataFrame({
    'company': ['AAPL','AAPL','AAPL','GOOG','GOOG','GOOG'],
    'quarter': ['Q1','Q2','Q3','Q1','Q2','Q3'],
    'revenue': [100, 120, 115, 200, 230, 250]
})

# Write your code below

Desired Output:

python
shape: (6, 5)
┌─────────┬─────────┬─────────┬──────────────┬────────────┐
│ company ┆ quarter ┆ revenue ┆ prev_revenue ┆ growth_pct │
│ ---     ┆ ---     ┆ ---     ┆ ---          ┆ ---        │
│ str     ┆ str     ┆ i64     ┆ i64          ┆ f64        │
╞═════════╪═════════╪═════════╪══════════════╪════════════╡
│ AAPL    ┆ Q1      ┆ 100     ┆ null         ┆ null       │
│ AAPL    ┆ Q2      ┆ 120     ┆ 100          ┆ 20.0       │
│ AAPL    ┆ Q3      ┆ 115     ┆ 120          ┆ -4.2       │
│ GOOG    ┆ Q1      ┆ 200     ┆ null         ┆ null       │
│ GOOG    ┆ Q2      ┆ 230     ┆ 200          ┆ 15.0       │
│ GOOG    ┆ Q3      ┆ 250     ┆ 230          ┆ 8.7        │
└─────────┴─────────┴─────────┴──────────────┴────────────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    'company': ['AAPL','AAPL','AAPL','GOOG','GOOG','GOOG'],
    'quarter': ['Q1','Q2','Q3','Q1','Q2','Q3'],
    'revenue': [100, 120, 115, 200, 230, 250]
})

result = df.with_columns(
    pl.col("revenue").shift(1).over("company").alias("prev_revenue")
).with_columns(
    ((pl.col("revenue") - pl.col("prev_revenue")) / pl.col("prev_revenue") * 100)
    .round(1).alias("growth_pct")
)
print(result)

90. How to detect outliers using the IQR method?

Difficulty Level: L2

Find values in ser that fall outside 1.5 × IQR from the quartiles.

Solve:

import polars as pl
import numpy as np
np.random.seed(42)
data = list(np.random.normal(50, 10, 20).round(1)) + [150.0, -30.0]  # inject outliers
ser = pl.Series("data", data)

# Write your code below

Desired Output:

python
Q1=40.9, Q3=55.4, IQR=14.5
Bounds: [19.1, 77.2]
Outliers: [150.0, -30.0]
Show Solution
import polars as pl
import numpy as np
np.random.seed(42)
data = list(np.random.normal(50, 10, 20).round(1)) + [150.0, -30.0]
ser = pl.Series("data", data)

q1 = ser.quantile(0.25)
q3 = ser.quantile(0.75)
iqr = q3 - q1
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr
outliers = ser.filter((ser < lower) | (ser > upper))
print(f"Q1={q1:.1f}, Q3={q3:.1f}, IQR={iqr:.1f}")
print(f"Bounds: [{lower:.1f}, {upper:.1f}]")
print("Outliers:", outliers.to_list())

91. How to use an anti-join to find missing records?

Difficulty Level: L2

Given a list of expected IDs and a DataFrame of received records, find which IDs are missing.

Solve:

import polars as pl
expected = pl.DataFrame({"id": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
received = pl.DataFrame({"id": [1, 2, 4, 5, 7, 9], "value": [10, 20, 40, 50, 70, 90]})

# Write your code below

Desired Output:

python
shape: (4, 1)
┌─────┐
│ id  │
│ --- │
│ i64 │
╞═════╡
│ 3   │
│ 6   │
│ 8   │
│ 10  │
└─────┘
Show Solution
import polars as pl
expected = pl.DataFrame({"id": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
received = pl.DataFrame({"id": [1, 2, 4, 5, 7, 9], "value": [10, 20, 40, 50, 70, 90]})

missing = expected.join(received, on="id", how="anti")
print(missing)

92. How to select columns by dtype?

Difficulty Level: L2

Select only the float columns from df.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
['Min.Price', 'Price', 'Max.Price', 'EngineSize', 'Fuel.tank.capacity', 'Rear.seat.room']
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

result = df.select(pl.col(pl.Float64))
print(result.columns)

93. How to categorize a numeric column using when/then/otherwise?

Difficulty Level: L2

Categorize 'medv' into 'low' (< 20), 'medium' (20-35), and 'high' (> 35).

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Write your code below

Desired Output:

python
shape: (10, 2)
┌──────┬──────────┐
│ medv ┆ category │
│ ---  ┆ ---      │
│ f64  ┆ str      │
╞══════╪══════════╡
│ 24.0 ┆ medium   │
│ 21.6 ┆ medium   │
│ 34.7 ┆ medium   │
│ 33.4 ┆ medium   │
│ 36.2 ┆ high     │
│ 28.7 ┆ medium   │
│ 22.9 ┆ medium   │
│ 27.1 ┆ medium   │
│ 16.5 ┆ low      │
│ 18.9 ┆ low      │
└──────┴──────────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

df = df.with_columns(
    pl.when(pl.col("medv") < 20).then(pl.lit("low"))
    .when(pl.col("medv") <= 35).then(pl.lit("medium"))
    .otherwise(pl.lit("high"))
    .alias("category")
)
print(df.select(["medv", "category"]).head(10))

94. How to compute the mode of each column in a DataFrame?

Difficulty Level: L2

Find the most frequent value in each column of df.

Solve:

import polars as pl
df = pl.DataFrame({
    'color': ['red','blue','red','green','blue','red','blue','green'],
    'size': ['S','M','L','M','M','S','L','M'],
    'rating': [5, 3, 5, 4, 3, 5, 3, 4]
})

# Write your code below

Desired Output:

python
Mode of 'color': red
Mode of 'size': M
Mode of 'rating': 5
Show Solution
import polars as pl
df = pl.DataFrame({
    'color': ['red','blue','red','green','blue','red','blue','green'],
    'size': ['S','M','L','M','M','S','L','M'],
    'rating': [5, 3, 5, 4, 3, 5, 3, 4]
})

for col in df.columns:
    mode_val = df[col].value_counts().sort("count", descending=True)[0, 0]
    print(f"Mode of '{col}': {mode_val}")

95. How to use lazy evaluation in Polars?

Difficulty Level: L2

Use lazy evaluation to filter rows where 'Price' > 30 and select 'Manufacturer', 'Model', and 'Price'.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (12, 3)
┌───────────────┬─────────────┬───────┐
│ Manufacturer  ┆ Model       ┆ Price │
│ ---           ┆ ---         ┆ ---   │
│ str           ┆ str         ┆ f64   │
╞═══════════════╪═════════════╪═══════╡
│ null          ┆ Legend      ┆ 33.9  │
│ Audi          ┆ 100         ┆ 37.7  │
│ Cadillac      ┆ DeVille     ┆ 34.7  │
│ Cadillac      ┆ Seville     ┆ 40.1  │
│ Chevrolet     ┆ Corvette    ┆ 38.0  │
│ …             ┆ …           ┆ …     │
│ Lincoln       ┆ Continental ┆ 34.3  │
│ Lincoln       ┆ Town_Car    ┆ 36.1  │
│ Mazda         ┆ RX-7        ┆ 32.5  │
│ Mercedes-Benz ┆ 190E        ┆ 31.9  │
│ Mercedes-Benz ┆ 300E        ┆ 61.9  │
└───────────────┴─────────────┴───────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

result = (
    df.lazy()
    .filter(pl.col("Price") > 30)
    .select(["Manufacturer", "Model", "Price"])
    .collect()
)
print(result)

96. How to use window functions to compute group-level statistics alongside row-level data?

Difficulty Level: L2

Add a column showing the mean 'Price' per 'Type' alongside every row, without collapsing the DataFrame.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (10, 4)
┌──────────────┬─────────┬───────┬────────────────────┐
│ Manufacturer ┆ Type    ┆ Price ┆ mean_price_by_type │
│ ---          ┆ ---     ┆ ---   ┆ ---                │
│ str          ┆ str     ┆ f64   ┆ f64                │
╞══════════════╪═════════╪═══════╪════════════════════╡
│ Acura        ┆ Small   ┆ 15.90 ┆ 10.20              │
│ null         ┆ Midsize ┆ 33.90 ┆ 27.65              │
│ Audi         ┆ Compact ┆ 29.10 ┆ 18.21              │
│ Audi         ┆ Midsize ┆ 37.70 ┆ 27.65              │
│ BMW          ┆ Midsize ┆ 30.00 ┆ 27.65              │
│ Buick        ┆ Midsize ┆ 15.70 ┆ 27.65              │
│ Buick        ┆ Large   ┆ 20.80 ┆ 24.30              │
│ Buick        ┆ Large   ┆ 23.70 ┆ 24.30              │
│ Buick        ┆ Midsize ┆ 26.30 ┆ 27.65              │
│ Cadillac     ┆ Large   ┆ 34.70 ┆ 24.30              │
└──────────────┴─────────┴───────┴────────────────────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

df = df.with_columns(
    pl.col("Price").mean().over("Type").alias("mean_price_by_type")
)
with pl.Config(float_precision=2):
    print(df.select(["Manufacturer", "Type", "Price", "mean_price_by_type"]).head(10))

97. How to understand the difference between rank(method='min') and rank(method='dense')?

Difficulty Level: L2

Rank students by score using both min and dense methods and observe the difference when there are ties.

Solve:

import polars as pl
df = pl.DataFrame({
    'student': ['Alice','Bob','Carol','Dave','Eve'],
    'score': [88, 92, 88, 95, 92]
})

# Write your code below

Desired Output:

python
shape: (5, 4)
┌─────────┬───────┬──────────┬────────────┐
│ student ┆ score ┆ rank_min ┆ rank_dense │
│ ---     ┆ ---   ┆ ---      ┆ ---        │
│ str     ┆ i64   ┆ i32      ┆ i32        │
╞═════════╪═══════╪══════════╪════════════╡
│ Alice   ┆ 88    ┆ 4        ┆ 3          │
│ Bob     ┆ 92    ┆ 2        ┆ 2          │
│ Carol   ┆ 88    ┆ 4        ┆ 3          │
│ Dave    ┆ 95    ┆ 1        ┆ 1          │
│ Eve     ┆ 92    ┆ 2        ┆ 2          │
└─────────┴───────┴──────────┴────────────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    'student': ['Alice','Bob','Carol','Dave','Eve'],
    'score': [88, 92, 88, 95, 92]
})

result = df.with_columns(
    pl.col("score").rank(method="min", descending=True).cast(pl.Int32).alias("rank_min"),
    pl.col("score").rank(method="dense", descending=True).cast(pl.Int32).alias("rank_dense"),
)
print(result)

98. How to clean and standardize messy string columns?

Difficulty Level: L2

Clean first_name and last_name (strip whitespace, title-case), combine into full_name, and normalize email_raw to lowercase.

Solve:

import polars as pl
df = pl.DataFrame({
    'first_name': ['  John ', 'ALICE', 'bob  ', ' Carol'],
    'last_name': ['DOE  ', '  Smith', 'JONES', ' Lee  '],
    'email_raw': ['John.Doe@GMAIL.COM', 'alice@Yahoo.com', 'BOB@hotmail.COM', 'carol@outlook.COM']
})

# Write your code below

Desired Output:

python
shape: (4, 2)
┌─────────────┬────────────────────┐
│ full_name   ┆ email_clean        │
│ ---         ┆ ---                │
│ str         ┆ str                │
╞═════════════╪════════════════════╡
│ John Doe    ┆ john.doe@gmail.com │
│ Alice Smith ┆ alice@yahoo.com    │
│ Bob Jones   ┆ bob@hotmail.com    │
│ Carol Lee   ┆ carol@outlook.com  │
└─────────────┴────────────────────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    'first_name': ['  John ', 'ALICE', 'bob  ', ' Carol'],
    'last_name': ['DOE  ', '  Smith', 'JONES', ' Lee  '],
    'email_raw': ['John.Doe@GMAIL.COM', 'alice@Yahoo.com', 'BOB@hotmail.COM', 'carol@outlook.COM']
})

result = df.with_columns(
    (pl.col("first_name").str.strip_chars().str.to_titlecase() + " " +
     pl.col("last_name").str.strip_chars().str.to_titlecase()).alias("full_name"),
    pl.col("email_raw").str.to_lowercase().alias("email_clean")
)
print(result.select(["full_name", "email_clean"]))

99. How to extract date features for machine learning?

Difficulty Level: L2

From a date column, extract month, weekday, quarter, and create a boolean is_holiday_season column (Nov, Dec, Jan).

Solve:

import polars as pl
from datetime import date
df = pl.DataFrame({
    'order_date': pl.date_range(date(2024, 1, 1), date(2024, 12, 31), eager=True)
}).sample(8, seed=42).sort("order_date")

# Write your code below

Desired Output:

python
shape: (8, 5)
┌────────────┬───────┬─────────┬─────────┬───────────────────┐
│ order_date ┆ month ┆ weekday ┆ quarter ┆ is_holiday_season │
│ ---        ┆ ---   ┆ ---     ┆ ---     ┆ ---               │
│ date       ┆ i8    ┆ i8      ┆ i8      ┆ bool              │
╞════════════╪═══════╪═════════╪═════════╪═══════════════════╡
│ 2024-02-15 ┆ 2     ┆ 4       ┆ 1       ┆ false             │
│ 2024-04-24 ┆ 4     ┆ 3       ┆ 2       ┆ false             │
│ 2024-08-02 ┆ 8     ┆ 5       ┆ 3       ┆ false             │
│ 2024-08-09 ┆ 8     ┆ 5       ┆ 3       ┆ false             │
│ 2024-09-10 ┆ 9     ┆ 2       ┆ 3       ┆ false             │
│ 2024-10-15 ┆ 10    ┆ 2       ┆ 4       ┆ false             │
│ 2024-10-19 ┆ 10    ┆ 6       ┆ 4       ┆ false             │
│ 2024-12-21 ┆ 12    ┆ 6       ┆ 4       ┆ true              │
└────────────┴───────┴─────────┴─────────┴───────────────────┘
Show Solution
import polars as pl
from datetime import date
df = pl.DataFrame({
    'order_date': pl.date_range(date(2024, 1, 1), date(2024, 12, 31), eager=True)
}).sample(8, seed=42).sort("order_date")

result = df.with_columns(
    pl.col("order_date").dt.month().alias("month"),
    pl.col("order_date").dt.weekday().alias("weekday"),
    pl.col("order_date").dt.quarter().alias("quarter"),
    (pl.col("order_date").dt.month().is_in([11, 12, 1])).alias("is_holiday_season"),
)
print(result)

100. How to explode a list column and compute aggregations?

Difficulty Level: L3

Each user has a list of tags. Explode the tags, then count how many users have each tag and list who they are.

Solve:

import polars as pl
df = pl.DataFrame({
    'user': ['Alice', 'Bob', 'Carol'],
    'tags': [['python', 'polars', 'ML'], ['python', 'rust'], ['polars', 'ML', 'DL', 'python']]
})

# Write your code below

Desired Output:

python
shape: (5, 3)
┌────────┬───────────┬───────────────────────────┐
│ tags   ┆ num_users ┆ users                     │
│ ---    ┆ ---       ┆ ---                       │
│ str    ┆ u32       ┆ list[str]                 │
╞════════╪═══════════╪═══════════════════════════╡
│ python ┆ 3         ┆ ["Alice", "Bob", "Carol"] │
│ ML     ┆ 2         ┆ ["Alice", "Carol"]        │
│ polars ┆ 2         ┆ ["Alice", "Carol"]        │
│ DL     ┆ 1         ┆ ["Carol"]                 │
│ rust   ┆ 1         ┆ ["Bob"]                   │
└────────┴───────────┴───────────────────────────┘
Show Solution
import polars as pl
df = pl.DataFrame({
    'user': ['Alice', 'Bob', 'Carol'],
    'tags': [['python', 'polars', 'ML'], ['python', 'rust'], ['polars', 'ML', 'DL', 'python']]
})

result = df.explode("tags").group_by("tags").agg(
    pl.col("user").count().alias("num_users"),
    pl.col("user").alias("users")
).sort("num_users", descending=True)
print(result)

101. How to use struct and unnest to work with nested data?

Difficulty Level: L3

Create a struct column from 'Manufacturer' and 'Model', then unnest it back.

Solve:

import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Write your code below

Desired Output:

python
shape: (3, 2)
┌─────────────────────┬───────┐
│ car_info            ┆ Price │
│ ---                 ┆ ---   │
│ struct[2]           ┆ f64   │
╞═════════════════════╪═══════╡
│ {"Acura","Integra"} ┆ 15.9  │
│ {null,"Legend"}     ┆ 33.9  │
│ {"Audi","90"}       ┆ 29.1  │
└─────────────────────┴───────┘

shape: (3, 3)
┌──────────────┬─────────┬───────┐
│ Manufacturer ┆ Model   ┆ Price │
│ ---          ┆ ---     ┆ ---   │
│ str          ┆ str     ┆ f64   │
╞══════════════╪═════════╪═══════╡
│ Acura        ┆ Integra ┆ 15.9  │
│ null         ┆ Legend  ┆ 33.9  │
│ Audi         ┆ 90      ┆ 29.1  │
└──────────────┴─────────┴───────┘
Show Solution
import polars as pl
df = pl.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', null_values="NA")

# Create a struct column
df_struct = df.select(
    pl.struct(["Manufacturer", "Model"]).alias("car_info"),
    "Price"
)
print(df_struct.head())

# Unnest back
df_unnested = df_struct.unnest("car_info")
print(df_unnested.head())
Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Python — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Free Callback - Limited Slots
Not Sure Which Course to Start With?
Talk to our AI Counsellors and Practitioners. We'll help you clear all your questions for your background and goals, bridging the gap between your current skills and a career in AI.
10-digit mobile number
📞
Thank You!
We'll Call You Soon!
Our learning advisor will reach out within 24 hours.
(Check your inbox too — we've sent a confirmation)
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science