Batch support for Transform? #157

arunmallya · 2017-04-27T21:40:04Z

Any plans for updating Transform to support batch inputs instead of just single images?
This is useful for applying transforms outside of a DataLoader (which does it on one image at a time).

fmassa · 2017-04-30T18:40:10Z

I don't think there are any plans on extending transforms to work on batched images.
Indeed, I think transforms are supposed to be applied only in Datasets, so only single instances are required.
Another point is that implementing batched transforms efficiently would require dedicated implementations, and would also raise the question of wether or not it would be interesting to have them on GPUs as well.

alykhantejani · 2017-10-01T16:18:19Z

Closing this for now as there currently are no plans to extend transforms to work on batched images.

Coolnesss · 2017-12-22T12:49:53Z

Just to follow up on this, right now to apply a transformation after getting a batch from DataLoader, I have to iterate over the batch and transform each tensor back to a PIL image, after which I do any additional transformations, and convert it back to tensor again. It's doable but it's fairly slow (unless I'm doing something wrong).

If you're open to a PR on this, I'd be happy to help if you can give me some pointers.

alykhantejani · 2017-12-28T11:05:15Z

@Coolnesss usually you do the transformations at the Dataset level. The DataLoader has many processes that read from the Dataset which effectively does your transformations in parallel.

Perhaps you can share some details of what your goal is and we can see if it falls outside of the current paradigm

Coolnesss · 2017-12-28T12:56:33Z

Thank you for your reply @alykhantejani !

I'm trying to create derivative datasets of e.g MNIST, by applying some category of random transformations on each set. Currently, I'm doing something like

d_transforms = [
    transforms.RandomHorizontalFlip(),
    # Some other transforms...
]
loaders = []
for i in range(len(d_transforms)):
    dataset = datasets.MNIST('./data', 
            train=train, 
            download=True, 
            transform=d_transforms[i]
    loaders.append(
        DataLoader(dataset, 
            shuffle=True, 
            pin_memory=True, 
            num_workers=1)
        )

Here, I get the desired outcome of having multiple DataLoaders that each provide samples from the transformed datasets. However, this is really slow, presumably because each worker tries to access the same files stored in ./data, and they can be accessed by one worker at a time (?). After profiling my program, nearly all of the time is spent on calls like

x, y = next(iter(train_loaders[i]))

I can think of two ways to solve this

Apply transformations after getting the batch from the loader - but this requires batched transformations, otherwise it's slow
Make n copies of MNIST on disk and let the workers each have their own copy, e.g dataset = datasets.MNIST('./data1', ...) etc.

Sorry for the lengthy post, and thanks for your help.

alykhantejani · 2018-01-02T14:41:38Z

@Coolnesss would this work for you:

class MultiTransformDataset(Dataset):
    def __init__(self, dataset, transforms):
        self.dataset = datset
        self.transforms = transforms

    def __get_item__(self, idx):
         input, target = self.dataset[idx]
         return tuple(t(input) for t in self.transforms) + (target, )

Coolnesss · 2018-01-04T21:08:48Z

Thanks for the workaround @alykhantejani

It's a much nicer solution, and somewhat faster too. Unfortunately it's still not as fast as I had hoped, perhaps the transforms themselves just take too much time. In any case, thanks for your help.

alykhantejani · 2018-01-05T14:06:40Z

@Coolnesss np. Let me know if you have any other questions

bermanmaxim · 2018-10-19T08:41:12Z

Note that you can also design a custom collate function that does the necessary transformations on your batch after collating it, e.g.

def get_collate(batch_transform=None):
    def mycollate(batch):
        collated = torch.utils.data.dataloader.default_collate(batch)
        if batch_transform is not None:
            collated = batch_transform(collated)
        return collated
    return mycollate

I find this strategy useful to add information in the batch (such as batch statistics, or complementary images in the dataset), and making the workers do the necessary computation.

hukkai · 2019-12-28T16:09:28Z

Hello, I am doing video tasks where each video is 32 frames of images. Then I need to resize and crop the 32 images by loops. A batch operation may be helpful or (faster?).

GabPrato · 2020-03-24T04:26:03Z

If this can help anyone, I implemented a few batch Transforms:
https://github.com/pratogab/batch-transforms

shivam13juna · 2023-02-10T08:41:15Z

Common, let's have an official implementation of batch transforms, it's 2023 already!!

AnthonyArmour · 2023-03-28T01:43:00Z

This would be great in the case of online batch inference. Currently looking for a solution to my current use case.

chenzhike110 · 2023-05-16T08:45:15Z

torchvision.transforms.Lambda may help

alykhantejani closed this as completed Oct 1, 2017

jiwoncpark mentioned this issue Feb 6, 2020

Consider batch transforms jiwoncpark/h0rton#23

Open

bsridatta mentioned this issue Apr 26, 2021

Implement ImageNormalizer based on Torch Transforms jina-ai/jina-hub#7796

Closed

rajveerb pushed a commit to rajveerb/vision that referenced this issue Nov 30, 2023

[rnn_translator] Compliance tagging (pytorch#157)

c2df98b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch support for Transform? #157

Batch support for Transform? #157

arunmallya commented Apr 27, 2017

fmassa commented Apr 30, 2017

alykhantejani commented Oct 1, 2017

Coolnesss commented Dec 22, 2017 •

edited

Loading

alykhantejani commented Dec 28, 2017

Coolnesss commented Dec 28, 2017

alykhantejani commented Jan 2, 2018

Coolnesss commented Jan 4, 2018

alykhantejani commented Jan 5, 2018

bermanmaxim commented Oct 19, 2018 •

edited

Loading

hukkai commented Dec 28, 2019

GabPrato commented Mar 24, 2020

shivam13juna commented Feb 10, 2023

AnthonyArmour commented Mar 28, 2023

chenzhike110 commented May 16, 2023

Batch support for Transform? #157

Batch support for Transform? #157

Comments

arunmallya commented Apr 27, 2017

fmassa commented Apr 30, 2017

alykhantejani commented Oct 1, 2017

Coolnesss commented Dec 22, 2017 • edited Loading

alykhantejani commented Dec 28, 2017

Coolnesss commented Dec 28, 2017

alykhantejani commented Jan 2, 2018

Coolnesss commented Jan 4, 2018

alykhantejani commented Jan 5, 2018

bermanmaxim commented Oct 19, 2018 • edited Loading

hukkai commented Dec 28, 2019

GabPrato commented Mar 24, 2020

shivam13juna commented Feb 10, 2023

AnthonyArmour commented Mar 28, 2023

chenzhike110 commented May 16, 2023

Coolnesss commented Dec 22, 2017 •

edited

Loading

bermanmaxim commented Oct 19, 2018 •

edited

Loading