Dataset Viewer
Last updated
Last updated
The dataset page includes a table with the contents of the dataset, arranged by pages of 100 rows. You can navigate between pages using the buttons at the bottom of the table.
You can search for a word in the dataset by typing it in the search bar at the top of the table. The search is case-insensitive and will match any row containing the word. The text is searched in the columns of type string
, even if the values are nested in a dictionary.
You can share a specific row by clicking on it, and then copying the URL in the address bar of your browser. For example https://huggingface.co/datasets/glue/viewer/mrpc/test?row=241 will open the dataset viewer on the MRPC dataset, on the test split, and on the 241st row.
Every dataset is auto-converted to the Parquet format. Click on โAuto-converted to Parquetโ to access the Parquet files. Refer to the Datasets Server docs to learn how to query the dataset with libraries such as Polars, Pandas or DuckDB.
You can also access the list of Parquet files programmatically using the Hub API: https://huggingface.co/api/datasets/glue/parquet.
For datasets >5GB, we only auto-convert to Parquet the first ~5GB of the dataset. In this case, an informational message lets you know that the Viewer is partial. This should be a large enough sample to represent the full dataset accurately, let us know if you need a bigger sample.
For the biggest datasets, the page shows a preview of the first 100 rows instead of a full-featured viewer. This restriction only applies for datasets over 5GB that are not natively in Parquet format.
The dataset viewer can be disabled. To do this, add a YAML section to the datasetโs README.md
file (create one if it does not already exist) and add a viewer
property with the value false
.
Copied
Note that the viewer is always disabled on the private datasets.