
Pandas
Use the pandas.DataFrame.sample() method from pandas library to randomly select rows from a Dataframe
Use the pandas.DataFrame.sample() method from pandas library to randomly select rows from a DataFrame
Randomly selecting rows can be useful for inspecting the values of a DataFrame.
In this article, you will learn about the different configurations of this method for randomly selecting rows from a DataFrame followed by a few practical tips for using this method for different purposes.
# Make a DataFrame
import pandas as pd
# Create the data of the DataFrame as a dictionary
data_df = {'Name': ['OpenCV', 'Tensorflow', 'Matlab', 'CUDA', 'Theano', 'Keras', 'GPUImage', 'YOLO', 'BoofCV'],
'Created By': ['Gary Bradsky', 'Google Brain', 'Cleve Moler', 'Ian Buck', 'MILA',
'Francois Chollet', 'Brad Larson', 'Joseph Redmon', 'Peter Abeles'],
'Written in': ['C++', 'Python', 'C++', 'C++', 'Python', 'Python', 'C', 'C', 'Java']}
# Create the dictionary
df = pd.DataFrame(data_df)
df

To learn more about creating and loading pandas DataFrames, click here.
You can directly use the DataFrame.sample() method without passing any parameters. On doing so, the default value gets passed to the parameters and a single randomly selected row of the DataFrame gets returned.
# Use the DataFrame.sample() method to return a single randomly selected row
df.sample()

The default configuration of the DataFrame.sample() method returns only a single row. To return multiple rows, you can use the n parameter to specify the number of rows to be returned.
# Return three randomly selected rows from the DataFrame
df.sample(n=3)

Using the frac parameter, you can specify the number of rows to be returned as a fraction of the total number of rows present in the DataFrame
# Return 30% of the total number of rows from the DataFrame
df.sample(frac=0.3)

With the help of this parameter, you can return the same row more than once. The default value of this parameter is False which means it cannot select the same row more than once. Set its value to True to return duplicate rows.
# Return the same three rows more than once
df.sample(n=3, replace=True, random_state=2)

The DataFrame.sample() method returns different rows each time it is called. However, if you want certain rows to have a higher chance of getting returned, you can use the weights parameter to specify the probability of those rows getting returned.
# Add bias to those rows which should be returned more frequently than the others
bias = [15, 10, 0.5, 0.55, 0.4, 0.2, 0.1, 0.6, 8]
df.sample(n=2, weights=bias,)

As you can see, the first, second, and the last row have been assigned higher weights than the other rows. This means that those rows will have a higher chance of being returned each time this method is called.
You can use the random_state parameter to ensure that the the same rows are returned each time the method is called.
# Ensure that the same three rows are repeated each time the method is called
df.sample(n=3, random_state=0)

print('Rows and columns present in the DataFrame:', df.shape)
df.sample(n=15, replace=True)
Rows and columns present in the DataFrame: (9, 3)

While using the weights parameter, you can assign weights greater than 1 to the rows though the sum of the weights gets standardized to 1.
The random_state parameter can be useful if want to share your code with someone else but ensure that the outputs are reproducible.
Q1: The frac parameter is used to return a fraction of the total rows of a DataFrame after randomly selecting them. True or False?
Q2: What is the difference between the function of the weights parameter and the random_state parameter?
Q3: Write the code to return any three randomly selected rows from the DataFrame df. Ensure that each time the same row will returned each time the method is called.
Q4: You have a DataFrame df that has 5 rows and 4 columns. Write the code to randomly return 10 rows from the DataFrame. The returned rows do not have to be necessarily unique.
Q5: Write the code to return 47% of all the rows in the DataFrame df.
The article was contributed by Shreyansh B and Shri Varsheni
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →Get the exact 10-course programming foundation that Data Science professionals use.