Filter rows in a dataset
Last updated
Last updated
Datasets Server provides a /filter
endpoint for filtering rows in a dataset.
Currently, only are supported so Datasets Server can index the contents and run the filter query without downloading the whole dataset.
This guide shows you how to use Datasets Server’s /filter
endpoint to filter rows based on a query string. Feel free to also try it out with .
The /filter
endpoint accepts the following query parameters:
dataset
: the dataset name, for example glue
or mozilla-foundation/common_voice_10_0
config
: the configuration name, for example cola
split
: the split name, for example train
where
: the filter condition
offset
: the offset of the slice, for example 150
length
: the length of the slice, for example 10
(maximum: 100
)
The where
parameter must be expressed as a comparison predicate, which can be:
a simple predicate composed of a column name, a comparison operator, and a value
the comparison operators are: =
, <>
, >
, >=
, <
, <=
a composite predicate composed of two or more simple predicates (optionally grouped with parentheses to indicate the order of evaluation), combined with logical operators
the logical operators are: AND
, OR
, NOT
For example, the following where
parameter value
Copied
will filter the data to select only those rows where the float “age” column is larger than 30 and, either the string “name” column is equal to ‘Simone’ or the integer “children” column is equal to 0.
Note that, following SQL syntax, string values in comparison predicates must be enclosed in single quotes, for example: 'Scarlett'
. Additionally, if the string value contains a single quote, it must be escaped with another single quote, for example: 'O''Hara'
.