Inference

Run Inference on servers

Inference is the process of using a trained model to make predictions on new data. As this process can be compute-intensive, running on a dedicated server can be an interesting option. The boincai_hub library provides an easy way to call a service that runs inference for hosted models. There are several services you can connect to:

  • Inference APIarrow-up-right: a service that allows you to run accelerated inference on BOINC AI’s infrastructure for free. This service is a fast way to get started, test different models, and prototype AI products.

  • Inference Endpointsarrow-up-right: a product to easily deploy models to production. Inference is run by BOINC AI in a dedicated, fully managed infrastructure on a cloud provider of your choice.

These services can be called with the InferenceClientarrow-up-right object. It acts as a replacement for the legacy InferenceApiarrow-up-right client, adding specific support for tasks and handling inference on both Inference APIarrow-up-right and Inference Endpointsarrow-up-right. Learn how to migrate to the new client in the Legacy InferenceAPI clientarrow-up-right section.

InferenceClientarrow-up-right is a Python client making HTTP calls to our APIs. If you want to make the HTTP calls directly using your preferred tool (curl, postman,…), please refer to the Inference APIarrow-up-right or to the Inference Endpointsarrow-up-right documentation pages.

For web development, a JS clientarrow-up-right has been released. If you are interested in game development, you might have a look at our C# projectarrow-up-right.

Getting started

Let’s get started with a text-to-image task:

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient()

>>> image = client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

We initialized an InferenceClientarrow-up-right with the default parameters. The only thing you need to know is the taskarrow-up-right you want to perform. By default, the client will connect to the Inference API and select a model to complete the task. In our example, we generated an image from a text prompt. The returned value is a PIL.Image object that can be saved to a file.

The API is designed to be simple. Not all parameters and options are available or described for the end user. Check out this pagearrow-up-right if you are interested in learning more about all the parameters available for each task.

Using a specific model

What if you want to use a specific model? You can specify it either as a parameter or directly at an instance level:

Copied

There are more than 200k models on the BOINC AI Hub! Each task in the InferenceClientarrow-up-right comes with a recommended model. Be aware that the HF recommendation can change over time without prior notice. Therefore it is best to explicitly set a model once you are decided. Also, in most cases you’ll be interested in finding a model specific to your needs. Visit the Modelsarrow-up-right page on the Hub to explore your possibilities.

Using a specific URL

The examples we saw above use the free-hosted Inference API. This proves to be very useful for prototyping and testing things quickly. Once you’re ready to deploy your model to production, you’ll need to use a dedicated infrastructure. That’s where Inference Endpointsarrow-up-right comes into play. It allows you to deploy any model and expose it as a private API. Once deployed, you’ll get a URL that you can connect to using exactly the same code as before, changing only the model parameter:

Copied

Authentication

Calls made with the InferenceClientarrow-up-right can be authenticated using a User Access Tokenarrow-up-right. By default, it will use the token saved on your machine if you are logged in (check out how to loginarrow-up-right). If you are not logged in, you can pass your token as an instance parameter:

Copied

Authentication is NOT mandatory when using the Inference API. However, authenticated users get a higher free-tier to play with the service. Token is also mandatory if you want to run inference on your private models or on private endpoints.

Supported tasks

InferenceClientarrow-up-right’s goal is to provide the easiest interface to run inference on BOINC AI models. It has a simple API that supports the most common tasks. Here is a list of the currently supported tasks:

Domain
Task
Supported
Documentation

Check out the Tasksarrow-up-right page to learn more about each task, how to use them, and the most popular models for each task.

Custom requests

However, it is not always possible to cover all use cases. For custom requests, the InferenceClient.post()arrow-up-right method gives you the flexibility to send any request to the Inference API. For example, you can specify how to parse the inputs and outputs. In the example below, the generated image is returned as raw bytes instead of parsing it as a PIL Image. This can be helpful if you don’t have Pillow installed in your setup and just care about the binary content of the image. InferenceClient.post()arrow-up-right is also useful to handle tasks that are not yet officially supported.

Copied

Async client

An async version of the client is also provided, based on asyncio and aiohttp. You can either install aiohttp directly or use the [inference] extra:

Copied

After installation all async API endpoints are available via AsyncInferenceClientarrow-up-right. Its initialization and APIs are strictly the same as the sync-only version.

Copied

For more information about the asyncio module, please refer to the official documentationarrow-up-right.

Advanced tips

In the above section, we saw the main aspects of InferenceClientarrow-up-right. Let’s dive into some more advanced tips.

Timeout

When doing inference, there are two main causes for a timeout:

  • The inference process takes a long time to complete.

  • The model is not available, for example when Inference API is loading it for the first time.

InferenceClientarrow-up-right has a global timeout parameter to handle those two aspects. By default, it is set to None, meaning that the client will wait indefinitely for the inference to complete. If you want more control in your workflow, you can set it to a specific value in seconds. If the timeout delay expires, an InferenceTimeoutErrorarrow-up-right is raised. You can catch it and handle it in your code:

Copied

Binary inputs

Some tasks require binary inputs, for example, when dealing with images or audio files. In this case, InferenceClientarrow-up-right tries to be as permissive as possible and accept different types:

  • raw bytes

  • a file-like object, opened as binary (with open("audio.flac", "rb") as f: ...)

  • a path (str or Path) pointing to a local file

  • a URL (str) pointing to a remote file (e.g. https://...). In this case, the file will be downloaded locally before sending it to the Inference API.

Copied

Legacy InferenceAPI client

InferenceClientarrow-up-right acts as a replacement for the legacy InferenceApiarrow-up-right client. It adds specific support for tasks and handles inference on both Inference APIarrow-up-right and Inference Endpointsarrow-up-right.

Here is a short guide to help you migrate from InferenceApiarrow-up-right to InferenceClientarrow-up-right.

Initialization

Change from

Copied

to

Copied

Run on a specific task

Change from

Copied

to

Copied

This is the recommended way to adapt your code to InferenceClientarrow-up-right. It lets you benefit from the task-specific methods like feature_extraction.

Run custom request

Change from

Copied

to

Copied

Run with parameters

Change from

Copied

to

Copied

Last updated