Understand pandas.DataFrame.sample(): Randomize DataFrame By Row – Python Pandas Tutorial

By | April 15, 2020

Python pandas often uses a dataframe object to save data. We often need to get some data from dataframe randomly. In this tutorial, we will discuss how to randomize a dataframe object.

We can use pandas.DataFrame.sample() to randomize a dataframe object.

DataFrame.sample(self: ~FrameOrSeries, n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

This function will return a random sample of items from an axis of dataframe object.

Important parameters explain

n: int, it determines the number of items from axis to return.

replace: boolean,  it determines whether return duplicated items.

weights: the weight of each imtes in dataframe to be sampled, default is equal probability.

axis: axis to sample

We will use some examples to illustrate how to use this function.

Preliminary

We should create a dataframe object. We will read a csv file to get this object.

import pandas as pd

df= pd.read_csv("test_member.csv", sep = '\t')
print(df)

The df is:

   No     Name  Age
0   1      Tom   24
1   2     Kate   22
2   3    Alexa   34
3   4     Kate   23
4   5     John   45
5   6     Lily   41
6   7    Bruce   23
7   8      Lin   33
8   9    Brown   31
9  10  Alibama   20

To know more about python pandas read csv file, you can read this tutorial:

A Beginner Guide to Python Pandas Read CSV

Here are some examples to show how to randomize a dataframe object.

1.Get a random element from dataframe

d = df.sample(n=1)
print(type(d))
print(d)

We can set n=1 to get a random row from a dataframe.

The random result is:

<class 'pandas.core.frame.DataFrame'>
   No   Name  Age
8   9  Brown   31

2.Randomize all rows in dataframe

d = df.sample(n = len(df))
print(type(d))
print(d)

We can set n = len(df) to randomize a dataframe.

The new random dataframe is:

<class 'pandas.core.frame.DataFrame'>
   No     Name  Age
5   6     Lily   41
2   3    Alexa   34
7   8      Lin   33
4   5     John   45
6   7    Bruce   23
0   1      Tom   24
9  10  Alibama   20
3   4     Kate   23
1   2     Kate   22
8   9    Brown   31

We can find all rows in new dataframe are unique.

3.Get random dataframe with duplicated rows

d = df.sample(len(df), replace = True)
print(type(d))
print(d)

If you set replace = True, the new dataframe will contain duplicated rows.

python pandas get random dataframe with duplicated rows

4.Get random dataframe with weights

d = df.sample(n = 4, weights = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(type(d))
print(d)

In this code, we will extract 4 (n = 4) random rows from df, each rows will be extractd base on weights: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. The sum of all weights is not 1, they will be normalized to sum to 1. These weights mean the last row of df will be sampled more likely.

The result is:

<class 'pandas.core.frame.DataFrame'>
   No     Name  Age
9  10  Alibama   20
5   6     Lily   41
6   7    Bruce   23
8   9    Brown   31

5.How to get row in the new random dataframe

As to the new random dataframe:

<class 'pandas.core.frame.DataFrame'>
   No     Name  Age
9  10  Alibama   20
5   6     Lily   41
6   7    Bruce   23
8   9    Brown   31

We can see the row index name is also randomized (9, 5, 6, 8), however, we also can get row by 0n-1 index.

For example:

print(d.iloc[1])

The result is:

No           6
Name      Lily
Age          41
Name: 5, dtype: object

6.Get random dataframe based on axis

In this tutorial, the dataframe df is (10 * 3). axis = 0, which means we will randomize dataframe by row. axis = 1, which means we will randomize dataframe by column.

Here is an example:

d = df.sample(4, axis = 1,replace = True)
print(type(d))
print(d)

The result is:

<class 'pandas.core.frame.DataFrame'>
      Name  Age  Age  No
0      Tom   24   24   1
1     Kate   22   22   2
2    Alexa   34   34   3
3     Kate   23   23   4
4     John   45   45   5
5     Lily   41   41   6
6    Bruce   23   23   7
7      Lin   33   33   8
8    Brown   31   31   9
9  Alibama   20   20  10

Leave a Reply

Your email address will not be published. Required fields are marked *