Datasets Overview

Datasets Overview

Datasets on the Hub

The BOINC AI Hub hosts a large number of community-curated datasetsarrow-up-right for a diverse range of tasks such as translation, automatic speech recognition, and image classification. Alongside the information contained in the dataset cardarrow-up-right, many datasets, such as GLUEarrow-up-right, include a Dataset Preview to showcase the data.

Each dataset is a Git repositoryarrow-up-right, equipped with the necessary scripts to download the data and generate splits for training, evaluation, and testing. For information on how a dataset repository is structured, refer to the Structure your repository guidearrow-up-right. Following the supported repo structure will ensure that your repository will have a preview on its dataset page on the Hub.

Search for datasets

Like models and Spaces, you can search the Hub for datasets using the search bar in the top navigation or on the main datasets pagearrow-up-right. Thereโ€™s a large number of languages, tasks, and licenses that you can use to filter your results to find a dataset thatโ€™s right for you.

Privacy

Since datasets are repositories, you can toggle their visibility between private and publicarrow-up-right through the Settings tab. If a dataset is owned by an organizationarrow-up-right, the privacy settings apply to all the members of the organization.

Last updated