Main classes

DatasetInfo

class datasets.DatasetInfo

( description: str = <factory>citation: str = <factory>homepage: str = <factory>license: str = <factory>features: typing.Optional[datasets.features.features.Features] = Nonepost_processed: typing.Optional[datasets.info.PostProcessedInfo] = Nonesupervised_keys: typing.Optional[datasets.info.SupervisedKeysData] = Nonetask_templates: typing.Optional[typing.List[datasets.tasks.base.TaskTemplate]] = Nonebuilder_name: typing.Optional[str] = Nonedataset_name: typing.Optional[str] = Noneconfig_name: typing.Optional[str] = Noneversion: typing.Union[str, datasets.utils.version.Version, NoneType] = Nonesplits: typing.Optional[dict] = Nonedownload_checksums: typing.Optional[dict] = Nonedownload_size: typing.Optional[int] = Nonepost_processing_size: typing.Optional[int] = Nonedataset_size: typing.Optional[int] = Nonesize_in_bytes: typing.Optional[int] = None )

Parameters

description (str) — A description of the dataset.
citation (str) — A BibTeX citation of the dataset.
homepage (str) — A URL to the official homepage for the dataset.
license (str) — The dataset’s license. It can be the name of the license or a paragraph containing the terms of the license.
features (Features, optional) — The features used to specify the dataset’s column types.
post_processed (PostProcessedInfo, optional) — Information regarding the resources of a possible post-processing of a dataset. For example, it can contain the information of an index.
supervised_keys (SupervisedKeysData, optional) — Specifies the input feature and the label for supervised learning if applicable for the dataset (legacy from TFDS).
builder_name (str, optional) — The name of the GeneratorBasedBuilder subclass used to create the dataset. Usually matched to the corresponding script name. It is also the snake_case version of the dataset builder class name.
config_name (str, optional) — The name of the configuration derived from BuilderConfig.
version (str or Version, optional) — The version of the dataset.
splits (dict, optional) — The mapping between split name and metadata.
download_checksums (dict, optional) — The mapping between the URL to download the dataset’s checksums and corresponding metadata.
download_size (int, optional) — The size of the files to download to generate the dataset, in bytes.
post_processing_size (int, optional) — Size of the dataset in bytes after post-processing, if any.
dataset_size (int, optional) — The combined size in bytes of the Arrow tables for all splits.
size_in_bytes (int, optional) — The combined size in bytes of all files associated with the dataset (downloaded files + Arrow files).
task_templates (List[TaskTemplate], optional) — The task templates to prepare the dataset for during training and evaluation. Each template casts the dataset’s Features to standardized column names and types as detailed in datasets.tasks.
**config_kwargs (additional keyword arguments) — Keyword arguments to be passed to the BuilderConfig and used in the DatasetBuilder.

Information about a dataset.

DatasetInfo documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list.

Not all fields are known on construction and may be updated later.

from_directory

Main classes

DatasetInfo

class datasets.DatasetInfo

Dataset

class datasets.Dataset

DatasetDict

class datasets.DatasetDict

IterableDataset

class datasets.IterableDataset

IterableDatasetDict

class datasets.IterableDatasetDict

Features

class datasets.Features

class datasets.Sequence

class datasets.ClassLabel

class datasets.Value

class datasets.Translation

class datasets.TranslationVariableLanguages

class datasets.Array2D

class datasets.Array3D

class datasets.Array4D

class datasets.Array5D

class datasets.Audio

class datasets.Image

MetricInfo

class datasets.MetricInfo

Metric

class datasets.Metric

Filesystems

class datasets.filesystems.S3FileSystem

Fingerprint

class datasets.fingerprint.Hasher