Inference API docs
Last updated
Last updated
Please refer to for detailed information.
For ๐ค Transformers models, power the API.
On top of Pipelines
and depending on the model type, there are several production optimizations like:
compiling models to optimized intermediary representations (e.g. ),
maintaining a Least Recently Used cache, ensuring that the most popular models are always loaded,
scaling the underlying compute infrastructure on the fly depending on the load constraints.
For models from , the API uses and runs in . Each library defines the implementation of .
Specify inference: false
in your model cardโs metadata.
For some tasks, there might not be support in the inference API, and, hence, there is no widget. For all libraries (except ๐ Transformers), there is a of library to supported tasks in the API. When a model repository has a task that is not supported by the repository library, the repository has inference: false
by default.
If you are interested in accelerated inference, higher volumes of requests, or an SLA, please contact us at api-enterprise at huggingface.co
.
You can head to the . Learn more about it in the .
Yes, the huggingface_hub
library has a client wrapper documented .