Check dataset validity
Last updated
Last updated
Before you download a dataset from the Hub, it is helpful to know if a specific dataset youโre interested in is available. Datasets Server provides the /is-valid
endpoint to check if a specific dataset works without any errors.
The API endpoint will return an error for datasets that cannot be loaded with the ๐ library, for example, because the data hasnโt been uploaded or the format is not supported.
The largest datasets are partially supported by Datasets Server. If they are , Datasets Server can extract the first 100 rows without downloading the whole dataset. This is especially useful for previewing large datasets where downloading the whole dataset may take hours! See the preview
field in the response of /is-valid
to check if a dataset is partially supported.
This guide shows you how to check dataset validity programmatically, but free to try it out with , , or .
/is-valid
checks whether a specific dataset loads without any error. This endpointโs query parameter requires you to specify the name of the dataset:
PythonJavaScriptcURLCopied
The response looks like this if a dataset is valid:
Copied
If only the first rows of a dataset are available, then the response looks like:
Copied
Finally, if the dataset is not valid at all, then the response is:
Copied
Some cases where a dataset is not valid are:
the dataset viewer is disabled
the dataset is gated but the access is not granted: no token is passed or the passed token is not authorized
the dataset is private
the dataset contains no data or the data format is not supported
Remember if a dataset is , you'll need to provide your user token to submit a successful query!