Datasets
  • 馃實GET STARTED
    • Datasets
    • Quickstart
    • Installation
  • 馃實TUTORIALS
    • Overview
    • Load a dataset from the Hub
    • Know your dataset
    • Preprocess
    • Evaluate predictions
    • Create a data
    • Share a dataset to the Hub
  • 馃實HOW-TO GUIDES
    • Overview
    • 馃實GENERAL USAGE
      • Load
      • Process
      • Stream
      • Use with TensorFlow
      • Use with PyTorch
      • Use with JAX
      • Use with Spark
      • Cache management
      • Cloud storage
      • Search index
      • Metrics
      • Beam Datasets
    • 馃實AUDIO
      • Load audio data
      • Process audio data
      • Create an audio dataset
    • 馃實VISION
      • Load image data
      • Process image data
      • Create an image dataset
      • Depth estimation
      • Image classification
      • Semantic segmentation
      • Object detection
    • 馃實TEXT
      • Load text data
      • Process text data
    • 馃實TABULAR
      • Load tabular data
    • 馃實DATASET REPOSITORY
      • Share
      • Create a dataset card
      • Structure your repository
      • Create a dataset loading script
  • 馃實CONCEPTUAL GUIDES
    • Datasets with Arrow
    • The cache
    • Dataset or IterableDataset
    • Dataset features
    • Build and load
    • Batch mapping
    • All about metrics
  • 馃實REFERENCE
    • Main classes
    • Builder classes
    • Loading methods
    • Table Classes
    • Logging methods
    • Task templates
Powered by GitBook
On this page
  • Load audio data
  • Installation
  • Local files
  • AudioFolder
  • AudioFolder with metadata
  1. HOW-TO GUIDES
  2. AUDIO

Load audio data

PreviousAUDIONextProcess audio data

Last updated 1 year ago

Load audio data

You can load an audio dataset using the feature that automatically decodes and resamples the audio files when you access the examples. Audio decoding is based on the python package, which uses the C library under the hood.

Installation

To work with audio datasets, you need to have the audio dependencies installed. Check out the guide to learn how to install it.

Local files

You can load your own dataset using the paths to your audio files. Use the function to take a column of audio file paths, and cast it to the feature:

Copied

>>> audio_dataset = Dataset.from_dict({"audio": ["path/to/audio_1", "path/to/audio_2", ..., "path/to/audio_n"]}).cast_column("audio", Audio())
>>> audio_dataset[0]["audio"]
{'array': array([ 0.        ,  0.00024414, -0.00024414, ..., -0.00024414,
         0.        ,  0.        ], dtype=float32),
 'path': 'path/to/audio_1',
 'sampling_rate': 16000}

AudioFolder

You can also load a dataset with an AudioFolder dataset builder. It does not require writing a custom dataloader, making it useful for quickly creating and loading audio datasets with several thousand audio files.

AudioFolder with metadata

To link your audio files with metadata information, make sure your dataset has a metadata.csv file. Your dataset structure might look like:

Copied

folder/train/metadata.csv
folder/train/first_audio_file.mp3
folder/train/second_audio_file.mp3
folder/train/third_audio_file.mp3

Your metadata.csv file must have a file_name column which links audio files with their metadata. An example metadata.csv file might look like:

Copied

file_name,transcription
first_audio_file.mp3,znowu si臋 duch z cia艂em zro艣nie w m艂odocianej wstaniesz wiosnie i mo偶esz skutkiem tych lek贸w umiera膰 wstawa膰 wiek wiek贸w dalej tam by艂y przestrogi jak sieka膰 g艂ow臋 jak nogi
second_audio_file.mp3,ju偶 u 藕wierzy艅ca podwoj贸w kr贸l zasiada przy nim ksi膮偶臋ta i panowie rada a gdzie wznios艂y kr膮偶y艂 ganek rycerze obok kochanek kr贸l skin膮艂 palcem zacz臋to igrzysko
third_audio_file.mp3,pewnie k臋dy艣 w ob艂臋dzie ubite min臋艂y szlaki zaczekajmy dzie艅 jaki po艣lemy szuka膰 wsz臋dzie dzi艣 jutro pewnie b臋dzie pos艂ali wsz臋dzie s艂ugi czekali dzie艅 i drugi gdy nic nie doczekali z p艂aczem chc膮 jecha膰 dali

AudioFolder will load audio data and create a transcription column containing texts from metadata.csv:

Copied

>>> from datasets import load_dataset

>>> dataset = load_dataset("audiofolder", data_dir="/path/to/folder")
>>> # OR by specifying the list of files
>>> dataset = load_dataset("audiofolder", data_files=["path/to/audio_1", "path/to/audio_2", ..., "path/to/audio_n"])

You can load remote datasets from their URLs with the data_files parameter:

Copied

>>> dataset = load_dataset("audiofolder", data_files=["https://foo.bar/audio_1", "https://foo.bar/audio_2", ..., "https://foo.bar/audio_n"]
>>> # for example, pass SpeechCommands archive:
>>> dataset = load_dataset("audiofolder", data_files="https://s3.amazonaws.com/datasets.huggingface.co/SpeechCommands/v0.01/v0.01_test.tar.gz")

Metadata can also be specified as JSON Lines, in which case use metadata.jsonl as the name of the metadata file. This format is helpful in scenarios when one of the columns is complex, e.g. a list of floats, to avoid parsing errors or reading the complex values as strings.

Copied

>>> from datasets import load_dataset

>>> dataset = load_dataset("audiofolder", data_dir="/path/to/folder", drop_metadata=True)

If you don鈥檛 have a metadata file, AudioFolder automatically infers the label name from the directory name. If you want to drop automatically created labels, set drop_labels=True. In this case, your dataset will only contain an audio column:

Copied

>>> from datasets import load_dataset

>>> dataset = load_dataset("audiofolder", data_dir="/path/to/folder_without_metadata", drop_labels=True)

To ignore the information in the metadata file, set drop_metadata=True in :

For more information about creating your own AudioFolder dataset, take a look at the guide.

For a guide on how to load any type of dataset, take a look at the .

馃實
馃實
Audio
soundfile
libsndfile
installation
cast_column()
Audio
load_dataset()
Create an audio dataset
general loading guide