Inference
Last updated
Last updated
Inference is the process of using a trained model to make predictions on new data. As this process can be compute-intensive, running on a dedicated server can be an interesting option. The boincai_hub
library provides an easy way to call a service that runs inference for hosted models. There are several services you can connect to:
: a service that allows you to run accelerated inference on BOINC AI’s infrastructure for free. This service is a fast way to get started, test different models, and prototype AI products.
: a product to easily deploy models to production. Inference is run by BOINC AI in a dedicated, fully managed infrastructure on a cloud provider of your choice.
These services can be called with the object. It acts as a replacement for the legacy client, adding specific support for tasks and handling inference on both and . Learn how to migrate to the new client in the section.
is a Python client making HTTP calls to our APIs. If you want to make the HTTP calls directly using your preferred tool (curl, postman,…), please refer to the or to the documentation pages.
For web development, a has been released. If you are interested in game development, you might have a look at our .
Let’s get started with a text-to-image task:
Copied
We initialized an with the default parameters. The only thing you need to know is the you want to perform. By default, the client will connect to the Inference API and select a model to complete the task. In our example, we generated an image from a text prompt. The returned value is a PIL.Image
object that can be saved to a file.
What if you want to use a specific model? You can specify it either as a parameter or directly at an instance level:
Copied
Copied
Copied
Authentication is NOT mandatory when using the Inference API. However, authenticated users get a higher free-tier to play with the service. Token is also mandatory if you want to run inference on your private models or on private endpoints.
Audio
✅
✅
✅
Computer Vision
✅
✅
✅
✅
✅
✅
✅
Multimodal
✅
✅
NLP
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
Tabular
✅
✅
Copied
An async version of the client is also provided, based on asyncio
and aiohttp
. You can either install aiohttp
directly or use the [inference]
extra:
Copied
Copied
When doing inference, there are two main causes for a timeout:
The inference process takes a long time to complete.
The model is not available, for example when Inference API is loading it for the first time.
Copied
raw bytes
a file-like object, opened as binary (with open("audio.flac", "rb") as f: ...
)
a path (str
or Path
) pointing to a local file
a URL (str
) pointing to a remote file (e.g. https://...
). In this case, the file will be downloaded locally before sending it to the Inference API.
Copied
Change from
Copied
to
Copied
Change from
Copied
to
Copied
Change from
Copied
to
Copied
Change from
Copied
to
Copied
The API is designed to be simple. Not all parameters and options are available or described for the end user. Check out if you are interested in learning more about all the parameters available for each task.
There are more than 200k models on the BOINC AI Hub! Each task in the comes with a recommended model. Be aware that the HF recommendation can change over time without prior notice. Therefore it is best to explicitly set a model once you are decided. Also, in most cases you’ll be interested in finding a model specific to your needs. Visit the page on the Hub to explore your possibilities.
The examples we saw above use the free-hosted Inference API. This proves to be very useful for prototyping and testing things quickly. Once you’re ready to deploy your model to production, you’ll need to use a dedicated infrastructure. That’s where comes into play. It allows you to deploy any model and expose it as a private API. Once deployed, you’ll get a URL that you can connect to using exactly the same code as before, changing only the model
parameter:
Calls made with the can be authenticated using a . By default, it will use the token saved on your machine if you are logged in (check out ). If you are not logged in, you can pass your token as an instance parameter:
’s goal is to provide the easiest interface to run inference on BOINC AI models. It has a simple API that supports the most common tasks. Here is a list of the currently supported tasks:
Check out the page to learn more about each task, how to use them, and the most popular models for each task.
However, it is not always possible to cover all use cases. For custom requests, the method gives you the flexibility to send any request to the Inference API. For example, you can specify how to parse the inputs and outputs. In the example below, the generated image is returned as raw bytes instead of parsing it as a PIL Image
. This can be helpful if you don’t have Pillow
installed in your setup and just care about the binary content of the image. is also useful to handle tasks that are not yet officially supported.
After installation all async API endpoints are available via . Its initialization and APIs are strictly the same as the sync-only version.
For more information about the asyncio
module, please refer to the .
In the above section, we saw the main aspects of . Let’s dive into some more advanced tips.
has a global timeout
parameter to handle those two aspects. By default, it is set to None
, meaning that the client will wait indefinitely for the inference to complete. If you want more control in your workflow, you can set it to a specific value in seconds. If the timeout delay expires, an is raised. You can catch it and handle it in your code:
Some tasks require binary inputs, for example, when dealing with images or audio files. In this case, tries to be as permissive as possible and accept different types:
acts as a replacement for the legacy client. It adds specific support for tasks and handles inference on both and .
Here is a short guide to help you migrate from to .
This is the recommended way to adapt your code to . It lets you benefit from the task-specific methods like feature_extraction
.