Create a dataset card
Last updated
Last updated
Each dataset should have a dataset card to promote responsible usage and inform users of any potential biases within the dataset. This idea was inspired by the Model Cards proposed by . Dataset cards help users understand a dataset’s contents, the context for using the dataset, how it was created, and any other considerations a user should be aware of.
Creating a dataset card is easy and can be done in a just a few steps:
Go to your dataset repository on the and click on Create Dataset Card to create a new README.md
file in your repository.
Use the Metadata UI to select the tags that describe your dataset. You can add a license, language, pretty_name, the task_categories, size_categories, and any other tags that you think are relevant. These tags help users discover and find your dataset on the Hub.
Once you’re done, commit the changes to the README.md
file and you’ll see the completed dataset card on your repository.
For a complete, but not required, set of tag options you can also look at the . This’ll have a few more tag options like multilinguality
and language_creators
which are useful but not absolutely necessary.
Click on the Import dataset card template link to automatically create a template with all the relevant fields to complete. Fill out the template sections to the best of your ability. Take a look at the for more detailed information about what to include in each section of the card. For fields you are unable to complete, you can write [More Information Needed].
YAML also allows you to customize the way your dataset is loaded by without the need to write any code.
Feel free to take a look at the , , and dataset cards as examples to help you get started.