Integrate a library with the Hub
Integrate your library with the Hub
The BOINC AI Hub aims to facilitate sharing machine learning models, checkpoints, and artifacts. This endeavor includes integrating the Hub into many of the amazing third-party libraries in the community. Some of the ones already integrated include spaCy, AllenNLP, and timm, among many others. Integration means users can download and upload files to the Hub directly from your library. We hope you will integrate your library and join us in democratizing artificial intelligence for everyone!
Integrating the Hub with your library provides many benefits, including:
Free model hosting for you and your users.
Built-in file versioning - even for huge files - made possible by Git-LFS.
All public models are powered by the Inference API.
In-browser widgets allow users to interact with your hosted models directly.
This tutorial will help you integrate the Hub into your library so your users can benefit from all the features offered by the Hub.
Before you begin, we recommend you create a Hugging Face account from which you can manage your repositories and files.
If you need help with the integration, feel free to open an issue, and we would be more than happy to help you!
Installation
Install the
huggingface_hub
library with pip in your environment:Copied
Once you have successfully installed the
huggingface_hub
library, log in to your Hugging Face account:Copied
Copied
Alternatively, if you prefer working from a Jupyter or Colaboratory notebook, login with
notebook_login
:Copied
notebook_login
will launch a widget in your notebook from which you can enter your Hugging Face credentials.
Download files from the Hub
Integration allows users to download your hosted files directly from the Hub using your library.
Use the hf_hub_download
function to retrieve a URL and download files from your repository. Downloaded files are stored in your cache: ~/.cache/huggingface/hub
. You don’t have to re-download the file the next time you use it, and for larger files, this can save a lot of time. Furthermore, if the repository is updated with a new version of the file, huggingface_hub
will automatically download the latest version and store it in the cache for you. Users don’t have to worry about updating their files.
For example, download the config.json
file from the lysandre/arxiv-nlp repository:
Copied
Download a specific version of the file by specifying the revision
parameter. The revision
parameter can be a branch name, tag, or commit hash.
The commit hash must be a full-length hash instead of the shorter 7-character commit hash:
Copied
Use the cache_dir
parameter to change where a file is stored:
Copied
Code sample
We recommend adding a code snippet to explain how to use a model in your downstream library.
Add a code snippet by updating the Libraries Typescript file with instructions for your model. For example, the Asteroid integration includes a brief code snippet for how to load and use an Asteroid model:
Copied
Doing so will also add a tag to your model so users can quickly identify models from your library.
Upload files to the Hub
You might also want to provide a method for creating model repositories and uploading files to the Hub directly from your library. The huggingface_hub
library offers two ways to assist you with creating repositories and uploading files:
create_repo
creates a repository on the Hub.upload_file
directly uploads files to a repository on the Hub.
create_repo
The create_repo
method creates a repository on the Hub. Use the name
parameter to provide a name for your repository:
Copied
When you check your Hugging Face account, you should now see a test-model
repository under your namespace.
upload_file
The upload_file
method uploads files to the Hub. This method requires the following:
A path to the file to upload.
The final path in the repository.
The repository you wish to push the files to.
For example:
Copied
If you need to upload more than one file, look at the utilities offered by the Repository
class.
Once again, if you check your Hugging Face account, you should see the file inside your repository.
Lastly, it is important to add a model card so users understand how to use your model. See here for more details about how to create a model card.
Set up the Inference API
Our Inference API powers models uploaded to the Hub through your library.
Create an Inference API Docker image
All third-party libraries are Dockerized, so you can install the dependencies you’ll need for your library to work correctly. Add your library to the existing Docker images by navigating to the Docker images folder.
Copy the
common
folder and rename it with the name of your library (e.g.docker/common
todocker/your-awesome-library
).There are four files you need to edit:
List the packages required for your library to work in
requirements.txt
.Update
app/main.py
with the tasks supported by your model (see here for a complete list of available tasks). Look out for theIMPLEMENT_THIS
flag to add your supported task.Copied
For each task your library supports, modify the
app/pipelines/task_name.py
files accordingly. We have also added anIMPLEMENT_THIS
flag in the pipeline files to guide you. If there isn’t a pipeline that supports your task, feel free to add one. Open an issue here, and we will be happy to help you.Add your model and task to the
tests/test_api.py
file. For example, if you have a text generation model:Copied
Finally, run the following test to ensure everything works as expected:
Copied
Register your libraries supported tasks on the hub
To register the tasks supported by your library on the hub you’ll need to add a mapping from your library name to its supported tasks in this file. This will ensure the inference API is registered for tasks supported by your model. This file is automatically generated as part of a GitHub Action in the api-inference-community repository repository. You can see an example of this here.
With these simple but powerful methods, you brought the full functionality of the Hub into your library. Users can download files stored on the Hub from your library with hf_hub_download
, create repositories with create_repo
, and upload files with upload_file
. You also set up Inference API with your library, allowing users to interact with your models on the Hub from inside a browser.
Last updated