Datasets-server
  • 🌍GET STARTED
    • BOINC AI Datasets server
    • Quickstart
    • Analyze a dataset on the Hub
  • 🌍GUIDES
    • Check dataset validity
    • List splits and configurations
    • Get dataset information
    • Preview a dataset
    • Download slices of rows
    • Search text in a dataset
    • Filter rows in a dataset
    • List Parquet files
    • Get the number of rows and the bytes size
    • Explore dataset statistics
    • 🌍QUERY DATASETS FROM DATASETS SERVER
      • Overview
      • ClickHouse
      • DuckDB
      • Pandas
      • Polars
  • 🌍CONCEPTUAL GUIDES
    • Splits and configurations
    • Data types
    • Server infrastructure
Powered by GitBook
On this page
  1. GUIDES

List splits and configurations

PreviousCheck dataset validityNextGet dataset information

Last updated 1 year ago

List splits and configurations

Datasets typically have splits and may also have configurations. A split is a subset of the dataset, like train and test, that are used during different stages of training and evaluating a model. A configuration is a sub-dataset contained within a larger dataset. Configurations are especially common in multilingual speech datasets where there may be a different configuration for each language. If you’re interested in learning more about splits and configurations, check out the !

This guide shows you how to use Datasets Server’s /splits endpoint to retrieve a dataset’s splits and configurations programmatically. Feel free to also try it out with , , or

The /splits endpoint accepts the dataset name as its query parameter:

PythonJavaScriptcURLCopied

import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.boincai.com/splits?dataset=duorc"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

The endpoint response is a JSON containing a list of the dataset’s splits and configurations. For example, the dataset has six splits and two configurations:

Copied

{
  "splits": [
    { "dataset": "duorc", "config": "ParaphraseRC", "split": "train" },
    { "dataset": "duorc", "config": "ParaphraseRC", "split": "validation" },
    { "dataset": "duorc", "config": "ParaphraseRC", "split": "test" },
    { "dataset": "duorc", "config": "SelfRC", "split": "train" },
    { "dataset": "duorc", "config": "SelfRC", "split": "validation" },
    { "dataset": "duorc", "config": "SelfRC", "split": "test" }
  ],
  "pending": [],
  "failed": []
}
🌍
Load a dataset from the Hub tutorial
Postman
RapidAPI
ReDoc
duorc