Datasets-server
  • ๐ŸŒGET STARTED
    • BOINC AI Datasets server
    • Quickstart
    • Analyze a dataset on the Hub
  • ๐ŸŒGUIDES
    • Check dataset validity
    • List splits and configurations
    • Get dataset information
    • Preview a dataset
    • Download slices of rows
    • Search text in a dataset
    • Filter rows in a dataset
    • List Parquet files
    • Get the number of rows and the bytes size
    • Explore dataset statistics
    • ๐ŸŒQUERY DATASETS FROM DATASETS SERVER
      • Overview
      • ClickHouse
      • DuckDB
      • Pandas
      • Polars
  • ๐ŸŒCONCEPTUAL GUIDES
    • Splits and configurations
    • Data types
    • Server infrastructure
Powered by GitBook
On this page
  • Check dataset validity
  • Check if a dataset is valid
  1. GUIDES

Check dataset validity

PreviousGUIDESNextList splits and configurations

Last updated 1 year ago

Check dataset validity

Before you download a dataset from the Hub, it is helpful to know if a specific dataset youโ€™re interested in is available. Datasets Server provides the /is-valid endpoint to check if a specific dataset works without any errors.

The API endpoint will return an error for datasets that cannot be loaded with the ๐ŸŒ library, for example, because the data hasnโ€™t been uploaded or the format is not supported.

The largest datasets are partially supported by Datasets Server. If they are , Datasets Server can extract the first 100 rows without downloading the whole dataset. This is especially useful for previewing large datasets where downloading the whole dataset may take hours! See the preview field in the response of /is-valid to check if a dataset is partially supported.

This guide shows you how to check dataset validity programmatically, but free to try it out with , , or .

Check if a dataset is valid

/is-valid checks whether a specific dataset loads without any error. This endpointโ€™s query parameter requires you to specify the name of the dataset:

PythonJavaScriptcURLCopied

import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.boincai.com/is-valid?dataset=rotten_tomatoes"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

The response looks like this if a dataset is valid:

Copied

{
  "viewer": true,
  "preview": true
}

If only the first rows of a dataset are available, then the response looks like:

Copied

{
  "viewer": false,
  "preview": true
}

Finally, if the dataset is not valid at all, then the response is:

Copied

{
  "viewer": false,
  "preview": false
}

Some cases where a dataset is not valid are:

  • the dataset viewer is disabled

  • the dataset is gated but the access is not granted: no token is passed or the passed token is not authorized

  • the dataset is private

  • the dataset contains no data or the data format is not supported

Remember if a dataset is , you'll need to provide your user token to submit a successful query!

๐ŸŒ
Datasets
streamable
Postman
RapidAPI
ReDoc
gated