Datasets
  • 🌍GET STARTED
    • Datasets
    • Quickstart
    • Installation
  • 🌍TUTORIALS
    • Overview
    • Load a dataset from the Hub
    • Know your dataset
    • Preprocess
    • Evaluate predictions
    • Create a data
    • Share a dataset to the Hub
  • 🌍HOW-TO GUIDES
    • Overview
    • 🌍GENERAL USAGE
      • Load
      • Process
      • Stream
      • Use with TensorFlow
      • Use with PyTorch
      • Use with JAX
      • Use with Spark
      • Cache management
      • Cloud storage
      • Search index
      • Metrics
      • Beam Datasets
    • 🌍AUDIO
      • Load audio data
      • Process audio data
      • Create an audio dataset
    • 🌍VISION
      • Load image data
      • Process image data
      • Create an image dataset
      • Depth estimation
      • Image classification
      • Semantic segmentation
      • Object detection
    • 🌍TEXT
      • Load text data
      • Process text data
    • 🌍TABULAR
      • Load tabular data
    • 🌍DATASET REPOSITORY
      • Share
      • Create a dataset card
      • Structure your repository
      • Create a dataset loading script
  • 🌍CONCEPTUAL GUIDES
    • Datasets with Arrow
    • The cache
    • Dataset or IterableDataset
    • Dataset features
    • Build and load
    • Batch mapping
    • All about metrics
  • 🌍REFERENCE
    • Main classes
    • Builder classes
    • Loading methods
    • Table Classes
    • Logging methods
    • Task templates
Powered by GitBook
On this page
  1. HOW-TO GUIDES
  2. DATASET REPOSITORY

Create a dataset card

PreviousShareNextStructure your repository

Last updated 1 year ago

Create a dataset card

Each dataset should have a dataset card to promote responsible usage and inform users of any potential biases within the dataset. This idea was inspired by the Model Cards proposed by . Dataset cards help users understand a dataset’s contents, the context for using the dataset, how it was created, and any other considerations a user should be aware of.

Creating a dataset card is easy and can be done in a just a few steps:

  1. Go to your dataset repository on the and click on Create Dataset Card to create a new README.md file in your repository.

  2. Use the Metadata UI to select the tags that describe your dataset. You can add a license, language, pretty_name, the task_categories, size_categories, and any other tags that you think are relevant. These tags help users discover and find your dataset on the Hub.

  1. Once you’re done, commit the changes to the README.md file and you’ll see the completed dataset card on your repository.

For a complete, but not required, set of tag options you can also look at the . This’ll have a few more tag options like multilinguality and language_creators which are useful but not absolutely necessary.

Click on the Import dataset card template link to automatically create a template with all the relevant fields to complete. Fill out the template sections to the best of your ability. Take a look at the for more detailed information about what to include in each section of the card. For fields you are unable to complete, you can write [More Information Needed].

YAML also allows you to customize the way your dataset is loaded by without the need to write any code.

Feel free to take a look at the , , and dataset cards as examples to help you get started.

🌍
🌍
Dataset Card specifications
Dataset Card Creation Guide
defining splits and/or configurations
SNLI
CNN/DailyMail
Allociné
Mitchell, 2018
Hub