Process image data

Process image data

This guide shows specific methods for processing image datasets. Learn how to:

For a guide on how to process any type of dataset, take a look at the general process guidearrow-up-right.

Map

The map()arrow-up-right function can apply transforms over an entire dataset.

For example, create a basic Resizearrow-up-right function:

Copied

>>> def transforms(examples):
...     examples["pixel_values"] = [image.convert("RGB").resize((100,100)) for image in examples["image"]]
...     return examples

Now use the map()arrow-up-right function to resize the entire dataset, and set batched=True to speed up the process by accepting batches of examples. The transform returns pixel_values as a cacheable PIL.Image object:

Copied

>>> dataset = dataset.map(transforms, remove_columns=["image"], batched=True)
>>> dataset[0]
{'label': 6,
 'pixel_values': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=100x100 at 0x7F058237BB10>}

The cache file saves time because you don’t have to execute the same transform twice. The map()arrow-up-right function is best for operations you only run once per training - like resizing an image - instead of using it for operations executed for each epoch, like data augmentations.

map()arrow-up-right takes up some memory, but you can reduce its memory requirements with the following parameters:

Both parameter values default to 1000, which can be expensive if you are storing images. Lower these values to use less memory when you use map()arrow-up-right.

Apply transforms

🌍 Datasets applies data augmentations from any library or package to your dataset. Transforms can be applied on-the-fly on batches of data with set_transform()arrow-up-right, which consumes less disk space.

The following example uses torchvisionarrow-up-right, but feel free to use other data augmentation libraries like Albumentationsarrow-up-right, Korniaarrow-up-right, and imgaugarrow-up-right.

For example, if you’d like to change the color properties of an image randomly:

Copied

Create a function to apply the ColorJitter transform:

Copied

Apply the transform with the set_transform()arrow-up-right function:

Copied

Last updated