Filter rows in a dataset
Filter rows in a dataset
Datasets Server provides a /filter
endpoint for filtering rows in a dataset.
Currently, only datasets with Parquet exports are supported so Datasets Server can index the contents and run the filter query without downloading the whole dataset.
This guide shows you how to use Datasets Server’s /filter
endpoint to filter rows based on a query string. Feel free to also try it out with ReDoc.
The /filter
endpoint accepts the following query parameters:
dataset
: the dataset name, for exampleglue
ormozilla-foundation/common_voice_10_0
config
: the configuration name, for examplecola
split
: the split name, for exampletrain
where
: the filter conditionoffset
: the offset of the slice, for example150
length
: the length of the slice, for example10
(maximum:100
)
The where
parameter must be expressed as a comparison predicate, which can be:
a simple predicate composed of a column name, a comparison operator, and a value
the comparison operators are:
=
,<>
,>
,>=
,<
,<=
a composite predicate composed of two or more simple predicates (optionally grouped with parentheses to indicate the order of evaluation), combined with logical operators
the logical operators are:
AND
,OR
,NOT
For example, the following where
parameter value
Copied
where=age>30 AND (name='Simone' OR children=0)
will filter the data to select only those rows where the float “age” column is larger than 30 and, either the string “name” column is equal to ‘Simone’ or the integer “children” column is equal to 0.
Note that, following SQL syntax, string values in comparison predicates must be enclosed in single quotes, for example: 'Scarlett'
. Additionally, if the string value contains a single quote, it must be escaped with another single quote, for example: 'O''Hara'
.
Last updated