BOINC AI Hub
  • 🌍BOINC AI Hub
  • 🌍Repositories
  • Getting Started with Repositories
  • Repository Settings
  • Pull Requests & Discussions
  • Notifications
  • Collections
  • 🌍Webhooks
    • How-to: Automatic fine-tuning with Auto-Train
    • How-to: Build a Discussion bot based on BLOOM
    • How-to: Create automatic metadata quality reports
  • Repository size recommendations
  • Next Steps
  • Licenses
  • 🌍Models
  • The Model Hub
  • 🌍Model Cards
    • Annotated Model Card
    • Carbon Emissions
    • Model Card Guidebook
    • Landscape Analysis
  • Gated Models
  • Uploading Models
  • Downloading Models
  • 🌍Integrated Libraries
    • Adapter Transformers
    • AllenNLP
    • Asteroid
    • Diffusers
    • ESPnet
    • fastai
    • Flair
    • Keras
    • ML-Agents
    • PaddleNLP
    • RL-Baselines3-Zoo
    • Sample Factory
    • Sentence Transformers
    • spaCy
    • SpanMarker
    • SpeechBrain
    • Stable-Baselines3
    • Stanza
    • TensorBoard
    • timm
    • Transformers
    • Transformers.js
  • 🌍Model Widgets
    • Widget Examples
  • Inference API docs
  • Frequently Asked Questions
  • 🌍Advanced Topics
    • Integrate a library with the Hub
    • Tasks
  • 🌍Datasets
  • Datasets Overview
  • Dataset Cards
  • Gated Datasets
  • Dataset Viewer
  • Using Datasets
  • Adding New Datasets
  • 🌍Spaces
  • 🌍Spaces Overview
    • Handling Spaces Dependencies
    • Spaces Settings
    • Using Spaces for Organization Cards
  • Spaces GPU Upgrades
  • Spaces Persistent Storage
  • Gradio Spaces
  • Streamlit Spaces
  • Static HTML Spaces
  • 🌍Docker Spaces
    • Your first Docker Spaces
    • Example Docker Spaces
    • Argilla on Spaces
    • Label Studio on Spaces
    • Aim on Space
    • Livebook on Spaces
    • Shiny on Spaces
    • ZenML on Spaces
    • Panel on Spaces
    • ChatUI on Spaces
    • Tabby on Spaces
  • Embed your Space
  • Run Spaces with Docker
  • Spaces Configuration Reference
  • Sign-In with BA button
  • Spaces Changelog
  • 🌍Advanced Topics
    • Using OpenCV in Spaces
    • More ways to create Spaces
    • Managing Spaces with Github Actions
    • Custom Python Spaces
    • How to Add a Space to ArXiv
    • Cookie limitations in Spaces
  • 🌍Other
  • 🌍Organizations
    • Managing Organizations
    • Organization Cards
    • Access Control in Organizations
  • Billing
  • 🌍Security
    • User Access Tokens
    • Git over SSH
    • Signing Commits with GPG
    • Single Sign-On (SSO)
    • Malware Scanning
    • Pickle Scanning
    • Secrets Scanning
  • Moderation
  • Paper Pages
  • Search
  • Digital Object Identifier (DOI)
  • Hub API Endpoints
  • Sign-In with BA
Powered by GitBook
On this page
  • Dataset viewer
  • Search a word in the dataset
  • Share a specific row
  • Access the parquet files
  • Very large datasets
  • Dataset preview
  • Disable the viewer

Dataset Viewer

PreviousGated DatasetsNextUsing Datasets

Last updated 1 year ago

Dataset viewer

The dataset page includes a table with the contents of the dataset, arranged by pages of 100 rows. You can navigate between pages using the buttons at the bottom of the table.

Search a word in the dataset

You can search for a word in the dataset by typing it in the search bar at the top of the table. The search is case-insensitive and will match any row containing the word. The text is searched in the columns of type string, even if the values are nested in a dictionary.

Share a specific row

You can share a specific row by clicking on it, and then copying the URL in the address bar of your browser. For example will open the dataset viewer on the MRPC dataset, on the test split, and on the 241st row.

Access the parquet files

Every dataset is auto-converted to the Parquet format. Click on to access the Parquet files. Refer to the to learn how to query the dataset with libraries such as Polars, Pandas or DuckDB.

You can also access the list of Parquet files programmatically using the : .

Very large datasets

For datasets >5GB, we only auto-convert to Parquet the first ~5GB of the dataset. In this case, an informational message lets you know that the Viewer is partial. This should be a large enough sample to represent the full dataset accurately, let us know if you need a bigger sample.

Dataset preview

For the biggest datasets, the page shows a preview of the first 100 rows instead of a full-featured viewer. This restriction only applies for datasets over 5GB that are not natively in Parquet format.

Disable the viewer

The dataset viewer can be disabled. To do this, add a YAML section to the dataset’s README.md file (create one if it does not already exist) and add a viewer property with the value false.

Copied

---
viewer: false
---

Note that the viewer is always disabled on the private datasets.

https://huggingface.co/datasets/glue/viewer/mrpc/test?row=241
“Auto-converted to Parquet”
Datasets Server docs
Hub API
https://huggingface.co/api/datasets/glue/parquet