Index into an image dataset using the row index first and then the image column - dataset[0]["image"] - to avoid decoding and resampling all the image objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
For a guide on how to load any type of dataset, take a look at the general loading guide.
Local files
You can load a dataset from the image path. Use the cast_column() function to accept a column of image file paths, and decode it into a PIL image with the Image feature:
If you only want to load the underlying path to the image dataset without decoding the image object, set decode=False in the Image feature:
Copied
ImageFolder
You can also load a dataset with an ImageFolder dataset builder which does not require writing a custom dataloader. This makes ImageFolder ideal for quickly creating and loading image datasets with several thousand images for different vision tasks. Your image dataset structure should look like this:
Copied
Load your dataset by specifying imagefolder and the directory of your dataset in data_dir:
Copied
Load remote datasets from their URLs with the data_files parameter:
Copied
Some datasets have a metadata file (metadata.csv/metadata.jsonl) associated with it, containing other information about the data like bounding boxes, text captions, and labels. The metadata is automatically loaded when you call load_dataset() and specify imagefolder.
To ignore the information in the metadata file, set drop_labels=False in load_dataset(), and allow ImageFolder to automatically infer the label name from the directory name:
Copied
For more information about creating your own ImageFolder dataset, take a look at the Create an image dataset guide.