PyTorch Load Multiple DataLoader for Training – A Step Guide – PyTorch Tutorial

By | September 14, 2022

Sometimes, we may need to load some dataloaders for a multiple task model in pytorch. How to load these dataloaders? In this tutorial, we will introduce you how to do.

Here we will use an example to explain.

Create a dataset

In order to create a dataloader, we should create a dataset first. We can read this tutorial to learn how to do.

Create a Custom Dataset for Loading Data in PyTorch – PyTorch Tutorial

Here we will create one as follows:

from torch.utils.data import dataset
from itertools import  cycle
class CustomDataset(dataset.Dataset):
    def __init__(self, num):
        super(CustomDataset, self).__init__()
        # load all data for training or test
        self.all_data = [i for i in range(num-20, num)]
    def __getitem__(self, index):
        return self.all_data[index], 2* self.all_data[index]
    def __len__(self):
        return len(self.all_data)

Create multiple dataloader

After having create a dataset, we can create some dataloader instances. For example:

from torch.utils.data import dataloader
train_dataset = CustomDataset(10)
train_dataset2 = CustomDataset(40)

train_loader = dataloader.DataLoader(
    dataset=train_dataset,
    batch_size=3,
    shuffle=True
    )

train_loader2 = dataloader.DataLoader(
    dataset=train_dataset2,
    batch_size=4,
    shuffle=True
    )

Here, we have create two different DataLoader, they are: train_loader and train_loader2.

Load multiple dataloader to train

In pytorch, we can use python enumerate() and zip() function to load multiple dataloader instances.

For example:

for i_batch ,batch_data in enumerate(zip(cycle(train_loader),train_loader2)):
    #print(i_batch, batch_data, type(batch_data))
    print("1=", batch_data[0])
    print("2=", batch_data[1])
    print("batch end")

Here we also use a cycle() function to repeat smaller dataloader instance. Because the size of each dataloader may different.

Run this code, we will see:

1= [tensor([-10,   7,   2]), tensor([-20,  14,   4])]
2= [tensor([24, 30, 27, 34]), tensor([48, 60, 54, 68])]
batch end
1= [tensor([1, 5, 3]), tensor([ 2, 10,  6])]
2= [tensor([25, 32, 28, 29]), tensor([50, 64, 56, 58])]
batch end
1= [tensor([-4,  0,  8]), tensor([-8,  0, 16])]
2= [tensor([26, 21, 35, 31]), tensor([52, 42, 70, 62])]
batch end
1= [tensor([-2, -9,  9]), tensor([ -4, -18,  18])]
2= [tensor([36, 39, 33, 37]), tensor([72, 78, 66, 74])]
batch end
1= [tensor([ 4, -1, -8]), tensor([  8,  -2, -16])]
2= [tensor([38, 22, 23, 20]), tensor([76, 44, 46, 40])]
batch end

We can find:

batch_data[0] is the batch of train_loader

batch_data[1] is the batch of train_loader2

Understand this feature, we can build a multiple task model easily.