Sometimes, we may need to load some dataloaders for a multiple task model in pytorch. How to load these dataloaders? In this tutorial, we will introduce you how to do.
Here we will use an example to explain.
Create a dataset
In order to create a dataloader, we should create a dataset first. We can read this tutorial to learn how to do.
Create a Custom Dataset for Loading Data in PyTorch – PyTorch Tutorial
Here we will create one as follows:
from torch.utils.data import dataset from itertools import cycle class CustomDataset(dataset.Dataset): def __init__(self, num): super(CustomDataset, self).__init__() # load all data for training or test self.all_data = [i for i in range(num-20, num)] def __getitem__(self, index): return self.all_data[index], 2* self.all_data[index] def __len__(self): return len(self.all_data)
Create multiple dataloader
After having create a dataset, we can create some dataloader instances. For example:
from torch.utils.data import dataloader train_dataset = CustomDataset(10) train_dataset2 = CustomDataset(40) train_loader = dataloader.DataLoader( dataset=train_dataset, batch_size=3, shuffle=True ) train_loader2 = dataloader.DataLoader( dataset=train_dataset2, batch_size=4, shuffle=True )
Here, we have create two different DataLoader, they are: train_loader and train_loader2.
Load multiple dataloader to train
In pytorch, we can use python enumerate() and zip() function to load multiple dataloader instances.
For example:
for i_batch ,batch_data in enumerate(zip(cycle(train_loader),train_loader2)): #print(i_batch, batch_data, type(batch_data)) print("1=", batch_data[0]) print("2=", batch_data[1]) print("batch end")
Here we also use a cycle() function to repeat smaller dataloader instance. Because the size of each dataloader may different.
Run this code, we will see:
1= [tensor([-10, 7, 2]), tensor([-20, 14, 4])] 2= [tensor([24, 30, 27, 34]), tensor([48, 60, 54, 68])] batch end 1= [tensor([1, 5, 3]), tensor([ 2, 10, 6])] 2= [tensor([25, 32, 28, 29]), tensor([50, 64, 56, 58])] batch end 1= [tensor([-4, 0, 8]), tensor([-8, 0, 16])] 2= [tensor([26, 21, 35, 31]), tensor([52, 42, 70, 62])] batch end 1= [tensor([-2, -9, 9]), tensor([ -4, -18, 18])] 2= [tensor([36, 39, 33, 37]), tensor([72, 78, 66, 74])] batch end 1= [tensor([ 4, -1, -8]), tensor([ 8, -2, -16])] 2= [tensor([38, 22, 23, 20]), tensor([76, 44, 46, 40])] batch end
We can find:
batch_data[0] is the batch of train_loader
batch_data[1] is the batch of train_loader2
Understand this feature, we can build a multiple task model easily.