Download slices of rows

Download slices of rows

Datasets Server provides a /rows endpoint for visualizing any slice of rows of a dataset. This will let you walk-through and inspect the data contained in a dataset.

Currently, only datasets with parquet exports are supported so Datasets Server can extract any slice of rows without downloading the whole dataset.

This guide shows you how to use Datasets Serverโ€™s /rows endpoint to download slices of a dataset. Feel free to also try it out with Postman, RapidAPI, or ReDoc.

The /rows endpoint accepts five query parameters:

  • dataset: the dataset name, for example glue or mozilla-foundation/common_voice_10_0

  • config: the configuration name, for example cola

  • split: the split name, for example train

  • offset: the offset of the slice, for example 150

  • length: the length of the slice, for example 10 (maximum: 100)

PythonJavaScriptcURLCopied

import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.boincai.com/rows?dataset=duorc&config=SelfRC&split=train&offset=150&length=10"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

The endpoint response is a JSON containing two keys:

  • The features of a dataset, including the columnโ€™s name and data type.

  • The slice of rows of a dataset and the content contained in each column of a specific row.

For example, here are the features and the slice of rows of the duorc/SelfRC train split from 150 to 151:

Copied

Image and audio samples

Image and audio are represented by a URL that points to the file.

Images

Images are represented as a JSON object with three fields:

  • src: URL to the image file

  • height: height (in pixels) of the image

  • width: width (in pixels) of the image

Here is an example of image, from the first row of the cifar100 dataset:

Copied

Caching

The images and audio samples are cached by the datasets server temporarily. Internally we empty the cached assets of certain datasets from time to time based on usage.

If a certain asset is not available, you may have to call /rows again.

Truncated responses

Unlike /first-rows, there is currently no truncation in /rows. The truncated_cells field is still there but is always empty.

Last updated