Filter rows in a dataset
Filter rows in a dataset
Datasets Server provides a /filter endpoint for filtering rows in a dataset.
Currently, only datasets with Parquet exports are supported so Datasets Server can index the contents and run the filter query without downloading the whole dataset.
This guide shows you how to use Datasets Server’s /filter endpoint to filter rows based on a query string. Feel free to also try it out with ReDoc.
The /filter endpoint accepts the following query parameters:
dataset: the dataset name, for exampleglueormozilla-foundation/common_voice_10_0config: the configuration name, for examplecolasplit: the split name, for exampletrainwhere: the filter conditionoffset: the offset of the slice, for example150length: the length of the slice, for example10(maximum:100)
The where parameter must be expressed as a comparison predicate, which can be:
a simple predicate composed of a column name, a comparison operator, and a value
the comparison operators are:
=,<>,>,>=,<,<=
a composite predicate composed of two or more simple predicates (optionally grouped with parentheses to indicate the order of evaluation), combined with logical operators
the logical operators are:
AND,OR,NOT
For example, the following where parameter value
Copied
where=age>30 AND (name='Simone' OR children=0)will filter the data to select only those rows where the float “age” column is larger than 30 and, either the string “name” column is equal to ‘Simone’ or the integer “children” column is equal to 0.
Note that, following SQL syntax, string values in comparison predicates must be enclosed in single quotes, for example: 'Scarlett'. Additionally, if the string value contains a single quote, it must be escaped with another single quote, for example: 'O''Hara'.
Last updated