Adding New Datasets

Adding new datasets

Any BOINC AI user can create a dataset! You can start by creating your dataset repository and choosing one of the following methods to upload your dataset:

While in many cases itโ€™s possible to just add raw data to your dataset repo in any supported formats (JSON, CSV, Parquet, text, images, audio files, โ€ฆ), for some large datasets you may want to create a loading script. This script defines the different configurations and splits of your dataset, as well as how to download and process the data.

Datasets outside a namespace

Datasets outside a namespace are maintained by the BOINC AI team. Unlike the naming convention used for community datasets (username/dataset_name or org/dataset_name), datasets outside a namespace can be referenced directly by their name (e.g. glue). If you find that an improvement is needed, use their โ€œCommunityโ€ tab to open a discussion or submit a PR on the Hub to propose edits.

Last updated