>>> def transforms(examples):
... examples["pixel_values"] = [image.convert("RGB").resize((100,100)) for image in examples["image"]]
... return examples
Now use the map() function to resize the entire dataset, and set batched=True to speed up the process by accepting batches of examples. The transform returns pixel_values as a cacheable PIL.Image object:
The cache file saves time because you donβt have to execute the same transform twice. The map() function is best for operations you only run once per training - like resizing an image - instead of using it for operations executed for each epoch, like data augmentations.
map() takes up some memory, but you can reduce its memory requirements with the following parameters:
batch_size determines the number of examples that are processed in one call to the transform function.
writer_batch_size determines the number of processed examples that are kept in memory before they are stored away.
Both parameter values default to 1000, which can be expensive if you are storing images. Lower these values to use less memory when you use map().
Apply transforms
π Datasets applies data augmentations from any library or package to your dataset. Transforms can be applied on-the-fly on batches of data with set_transform(), which consumes less disk space.