Create a dataset for training
Last updated
Last updated
There are many datasets on the to train a model on, but if you canβt find one youβre interested in or want to use your own, you can create a dataset with the π library. The dataset structure depends on the task you want to train your model on. The most basic dataset structure is a directory of images for tasks like unconditional image generation. Another dataset structure may be a directory of images and a text file containing their corresponding text captions for tasks like text-to-image generation.
This guide will show you two ways to create a dataset to finetune on:
provide a folder of images to the --train_data_dir
argument
upload a dataset to the Hub and pass the dataset repository id to the --dataset_name
argument
π‘ Learn more about how to create an image dataset for training in the guide.
For unconditional generation, you can provide your own dataset as a folder of images. The training script uses the builder from π Datasets to automatically build a dataset from the folder. Your directory structure should look like:
Copied
Pass the path to the dataset directory to the --train_data_dir
argument, and then you can start training:
Copied
You can use the data_dir
or data_files
parameters to specify the location of the dataset. The data_files
parameter supports mapping specific files to dataset splits like train
or test
:
Copied
Copied
Now the dataset is available for training by passing the dataset name to the --dataset_name
argument:
Copied
Now that youβve created a dataset, you can plug it into the train_data_dir
(if your dataset is local) or dataset_name
(if your dataset is on the Hub) arguments of a training script.
π‘ For more details and context about creating and uploading a dataset to the Hub, take a look at the π post.
Start by creating a dataset with the feature, which creates an image
column containing the PIL-encoded images.
Then use the method to upload the dataset to the Hub:
For your next steps, feel free to try and use your dataset to train a model for or !