Check dataset validity

Before you download a dataset from the Hub, it is helpful to know if a specific dataset you’re interested in is available. Datasets Server provides the /is-valid endpoint to check if a specific dataset works without any errors.

The API endpoint will return an error for datasets that cannot be loaded with the 🌍 Datasets library, for example, because the data hasn’t been uploaded or the format is not supported.

The largest datasets are partially supported by Datasets Server. If they are streamable, Datasets Server can extract the first 100 rows without downloading the whole dataset. This is especially useful for previewing large datasets where downloading the whole dataset may take hours! See the preview field in the response of /is-valid to check if a dataset is partially supported.

This guide shows you how to check dataset validity programmatically, but free to try it out with Postman, RapidAPI, or ReDoc.

Check if a dataset is valid

/is-valid checks whether a specific dataset loads without any error. This endpoint’s query parameter requires you to specify the name of the dataset:

PythonJavaScriptcURLCopied

import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.boincai.com/is-valid?dataset=rotten_tomatoes"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

The response looks like this if a dataset is valid:

Copied

{
  "viewer": true,
  "preview": true
}

If only the first rows of a dataset are available, then the response looks like:

Copied

{
  "viewer": false,
  "preview": true
}

Finally, if the dataset is not valid at all, then the response is:

Copied

{
  "viewer": false,
  "preview": false
}

Some cases where a dataset is not valid are:

the dataset viewer is disabled
the dataset is gated but the access is not granted: no token is passed or the passed token is not authorized
the dataset is private
the dataset contains no data or the data format is not supported

Remember if a dataset is gated, you'll need to provide your user token to submit a successful query!

PreviousGUIDES NextList splits and configurations

Last updated 1 year ago