Datasets-server
  • 🌍GET STARTED
    • BOINC AI Datasets server
    • Quickstart
    • Analyze a dataset on the Hub
  • 🌍GUIDES
    • Check dataset validity
    • List splits and configurations
    • Get dataset information
    • Preview a dataset
    • Download slices of rows
    • Search text in a dataset
    • Filter rows in a dataset
    • List Parquet files
    • Get the number of rows and the bytes size
    • Explore dataset statistics
    • 🌍QUERY DATASETS FROM DATASETS SERVER
      • Overview
      • ClickHouse
      • DuckDB
      • Pandas
      • Polars
  • 🌍CONCEPTUAL GUIDES
    • Splits and configurations
    • Data types
    • Server infrastructure
Powered by GitBook
On this page
  1. GUIDES

Filter rows in a dataset

PreviousSearch text in a datasetNextList Parquet files

Last updated 1 year ago

Filter rows in a dataset

Datasets Server provides a /filter endpoint for filtering rows in a dataset.

Currently, only are supported so Datasets Server can index the contents and run the filter query without downloading the whole dataset.

This guide shows you how to use Datasets Server’s /filter endpoint to filter rows based on a query string. Feel free to also try it out with .

The /filter endpoint accepts the following query parameters:

  • dataset: the dataset name, for example glue or mozilla-foundation/common_voice_10_0

  • config: the configuration name, for example cola

  • split: the split name, for example train

  • where: the filter condition

  • offset: the offset of the slice, for example 150

  • length: the length of the slice, for example 10 (maximum: 100)

The where parameter must be expressed as a comparison predicate, which can be:

  • a simple predicate composed of a column name, a comparison operator, and a value

    • the comparison operators are: =, <>, >, >=, <, <=

  • a composite predicate composed of two or more simple predicates (optionally grouped with parentheses to indicate the order of evaluation), combined with logical operators

    • the logical operators are: AND, OR, NOT

For example, the following where parameter value

Copied

where=age>30 AND (name='Simone' OR children=0)

will filter the data to select only those rows where the float “age” column is larger than 30 and, either the string “name” column is equal to ‘Simone’ or the integer “children” column is equal to 0.

Note that, following SQL syntax, string values in comparison predicates must be enclosed in single quotes, for example: 'Scarlett'. Additionally, if the string value contains a single quote, it must be escaped with another single quote, for example: 'O''Hara'.

🌍
datasets with Parquet exports
ReDoc