BOINC AI Hub
  • 🌍BOINC AI Hub
  • 🌍Repositories
  • Getting Started with Repositories
  • Repository Settings
  • Pull Requests & Discussions
  • Notifications
  • Collections
  • 🌍Webhooks
    • How-to: Automatic fine-tuning with Auto-Train
    • How-to: Build a Discussion bot based on BLOOM
    • How-to: Create automatic metadata quality reports
  • Repository size recommendations
  • Next Steps
  • Licenses
  • 🌍Models
  • The Model Hub
  • 🌍Model Cards
    • Annotated Model Card
    • Carbon Emissions
    • Model Card Guidebook
    • Landscape Analysis
  • Gated Models
  • Uploading Models
  • Downloading Models
  • 🌍Integrated Libraries
    • Adapter Transformers
    • AllenNLP
    • Asteroid
    • Diffusers
    • ESPnet
    • fastai
    • Flair
    • Keras
    • ML-Agents
    • PaddleNLP
    • RL-Baselines3-Zoo
    • Sample Factory
    • Sentence Transformers
    • spaCy
    • SpanMarker
    • SpeechBrain
    • Stable-Baselines3
    • Stanza
    • TensorBoard
    • timm
    • Transformers
    • Transformers.js
  • 🌍Model Widgets
    • Widget Examples
  • Inference API docs
  • Frequently Asked Questions
  • 🌍Advanced Topics
    • Integrate a library with the Hub
    • Tasks
  • 🌍Datasets
  • Datasets Overview
  • Dataset Cards
  • Gated Datasets
  • Dataset Viewer
  • Using Datasets
  • Adding New Datasets
  • 🌍Spaces
  • 🌍Spaces Overview
    • Handling Spaces Dependencies
    • Spaces Settings
    • Using Spaces for Organization Cards
  • Spaces GPU Upgrades
  • Spaces Persistent Storage
  • Gradio Spaces
  • Streamlit Spaces
  • Static HTML Spaces
  • 🌍Docker Spaces
    • Your first Docker Spaces
    • Example Docker Spaces
    • Argilla on Spaces
    • Label Studio on Spaces
    • Aim on Space
    • Livebook on Spaces
    • Shiny on Spaces
    • ZenML on Spaces
    • Panel on Spaces
    • ChatUI on Spaces
    • Tabby on Spaces
  • Embed your Space
  • Run Spaces with Docker
  • Spaces Configuration Reference
  • Sign-In with BA button
  • Spaces Changelog
  • 🌍Advanced Topics
    • Using OpenCV in Spaces
    • More ways to create Spaces
    • Managing Spaces with Github Actions
    • Custom Python Spaces
    • How to Add a Space to ArXiv
    • Cookie limitations in Spaces
  • 🌍Other
  • 🌍Organizations
    • Managing Organizations
    • Organization Cards
    • Access Control in Organizations
  • Billing
  • 🌍Security
    • User Access Tokens
    • Git over SSH
    • Signing Commits with GPG
    • Single Sign-On (SSO)
    • Malware Scanning
    • Pickle Scanning
    • Secrets Scanning
  • Moderation
  • Paper Pages
  • Search
  • Digital Object Identifier (DOI)
  • Hub API Endpoints
  • Sign-In with BA
Powered by GitBook
On this page
  • Using ESPnet at Hugging Face
  • Exploring ESPnet in the Hub
  • Using existing models
  • Sharing your models
  • Additional resources
  1. Integrated Libraries

ESPnet

PreviousDiffusersNextfastai

Last updated 1 year ago

Using ESPnet at Hugging Face

espnet is an end-to-end toolkit for speech processing, including automatic speech recognition, text to speech, speech enhancement, dirarization and other tasks.

Exploring ESPnet in the Hub

You can find hundreds of espnet models by filtering at the left of the .

All models on the Hub come up with useful features:

  1. An automatically generated model card with a description, a training configuration, licenses and more.

  2. Metadata tags that help for discoverability and contain information such as license, language and datasets.

  3. An interactive widget you can use to play out with the model directly in the browser.

  4. An Inference API that allows to make inference requests.

Using existing models

For a full guide on loading pre-trained models, we recommend checking out the ).

If you’re interested in doing inference, different classes for different tasks have a from_pretrained method that allows loading models from the Hub. For example:

  • Speech2Text for Automatic Speech Recognition.

  • Text2Speech for Text to Speech.

  • SeparateSpeech for Audio Source Separation.

Here is an inference example:

Copied

import soundfile
from espnet2.bin.tts_inference import Text2Speech

text2speech = Text2Speech.from_pretrained("model_name")
speech = text2speech("foobar")["wav"]
soundfile.write("out.wav", speech.numpy(), text2speech.fs, "PCM_16")

If you want to see how to load a specific model, you can click Use in ESPnet and you will be given a working snippet that you can load it!

Sharing your models

The run.sh script allows to upload a given model to a Hugging Face repository.

Copied

./run.sh --stage 15 --skip_upload_hf false --hf_repo username/model_repo

Additional resources

ESPnet outputs a zip file that can be uploaded to Hugging Face easily. For a full guide on sharing models, we recommend checking out the ).

ESPnet .

ESPnet model zoo .

Integration .

🌍
official guide
docs
repository
docs
models page
official guide