Adding New Datasets

Adding new datasets

Any BOINC AI user can create a dataset! You can start by creating your dataset repository and choosing one of the following methods to upload your dataset:

While in many cases it’s possible to just add raw data to your dataset repo in any supported formats (JSON, CSV, Parquet, text, images, audio files, …), for some large datasets you may want to create a loading script. This script defines the different configurations and splits of your dataset, as well as how to download and process the data.

Datasets outside a namespace

Datasets outside a namespace are maintained by the BOINC AI team. Unlike the naming convention used for community datasets (username/dataset_name or org/dataset_name), datasets outside a namespace can be referenced directly by their name (e.g. glue). If you find that an improvement is needed, use their “Community” tab to open a discussion or submit a PR on the Hub to propose edits.

PreviousUsing Datasets NextSpaces

Last updated 1 year ago