Preview a dataset

Preview a dataset

Datasets Server provides a /first-rows endpoint for visualizing the first 100 rows of a dataset. This’ll give you a good idea of the data types and example data contained in a dataset.

dataset-viewer

This guide shows you how to use Datasets Server’s /first-rows endpoint to preview a dataset. Feel free to also try it out with Postman, RapidAPI, or ReDoc.

The /first-rows endpoint accepts three query parameters:

  • dataset: the dataset name, for example glue or mozilla-foundation/common_voice_10_0

  • config: the configuration name, for example cola

  • split: the split name, for example train

PythonJavaScriptcURLCopied

The endpoint response is a JSON containing two keys:

  • The features of a dataset, including the column’s name and data type.

  • The first 100 rows of a dataset and the content contained in each column of a specific row.

For example, here are the features and the first 100 rows of the duorc/SelfRC train split:

Copied

Truncated responses

For some datasets, the response size from /first-rows may exceed 1MB, in which case the response is truncated until the size is under 1MB. This means you may not get 100 rows in your response because the rows are truncated, in which case the truncated field would be true.

In some cases, if even the first few rows generate a response that exceeds 1MB, some of the columns are truncated and converted to a string. You’ll see these listed in the truncated_cells field.

For example, the ett dataset only returns 10 rows, and the target and feat_dynamic_real columns are truncated:

Copied

Last updated