Hub Python Library
  • 🌍GET STARTED
    • Home
    • Quickstart
    • Installation
  • 🌍HOW-TO GUIDES
    • Overview
    • Download files
    • Upload files
    • BAFileSystem
    • Repository
    • Search
    • Inference
    • Community Tab
    • Collections
    • Cache
    • Model Cards
    • Manage your Space
    • Integrate a library
    • Webhooks server
  • 🌍CONCEPTUAL GUIDES
    • Git vs HTTP paradigm
  • 🌍REFERENCE
    • Overview
    • Login and logout
    • Environment variables
    • Managing local and online repositories
    • BOINC AI Hub API
    • Downloading files
    • Mixins & serialization methods
    • Inference Client
    • BaFileSystem
    • Utilities
    • Discussions and Pull Requests
    • Cache-system reference
    • Repo Cards and Repo Card Data
    • Space runtime
    • Collections
    • TensorBoard logger
    • Webhooks server
Powered by GitBook
On this page
  • Run Inference on servers
  • Getting started
  • Supported tasks
  • Custom requests
  • Async client
  • Advanced tips
  • Legacy InferenceAPI client
  1. HOW-TO GUIDES

Inference

PreviousSearchNextCommunity Tab

Last updated 1 year ago

Run Inference on servers

Inference is the process of using a trained model to make predictions on new data. As this process can be compute-intensive, running on a dedicated server can be an interesting option. The boincai_hub library provides an easy way to call a service that runs inference for hosted models. There are several services you can connect to:

  • : a service that allows you to run accelerated inference on BOINC AI’s infrastructure for free. This service is a fast way to get started, test different models, and prototype AI products.

  • : a product to easily deploy models to production. Inference is run by BOINC AI in a dedicated, fully managed infrastructure on a cloud provider of your choice.

These services can be called with the object. It acts as a replacement for the legacy client, adding specific support for tasks and handling inference on both and . Learn how to migrate to the new client in the section.

is a Python client making HTTP calls to our APIs. If you want to make the HTTP calls directly using your preferred tool (curl, postman,…), please refer to the or to the documentation pages.

For web development, a has been released. If you are interested in game development, you might have a look at our .

Getting started

Let’s get started with a text-to-image task:

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient()

>>> image = client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

We initialized an with the default parameters. The only thing you need to know is the you want to perform. By default, the client will connect to the Inference API and select a model to complete the task. In our example, we generated an image from a text prompt. The returned value is a PIL.Image object that can be saved to a file.

Using a specific model

What if you want to use a specific model? You can specify it either as a parameter or directly at an instance level:

Copied

>>> from boincai_hub import InferenceClient
# Initialize client for a specific model
>>> client = InferenceClient(model="prompthero/openjourney-v4")
>>> client.text_to_image(...)
# Or use a generic client but pass your model as an argument
>>> client = InferenceClient()
>>> client.text_to_image(..., model="prompthero/openjourney-v4")

Using a specific URL

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient(model="https://uu149rez6gw9ehej.eu-west-1.aws.endpoints.boincai.cloud/deepfloyd-if")
# or
>>> client = InferenceClient()
>>> client.text_to_image(..., model="https://uu149rez6gw9ehej.eu-west-1.aws.endpoints.boincai.cloud/deepfloyd-if")

Authentication

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient(token="hf_***")

Authentication is NOT mandatory when using the Inference API. However, authenticated users get a higher free-tier to play with the service. Token is also mandatory if you want to run inference on your private models or on private endpoints.

Supported tasks

Domain
Task
Supported
Documentation

Audio

✅

✅

✅

Computer Vision

✅

✅

✅

✅

✅

✅

✅

Multimodal

✅

✅

NLP

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

Tabular

✅

✅

Custom requests

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient()
>>> response = client.post(json={"inputs": "An astronaut riding a horse on the moon."}, model="stabilityai/stable-diffusion-2-1")
>>> response.content # raw bytes
b'...'

Async client

An async version of the client is also provided, based on asyncio and aiohttp. You can either install aiohttp directly or use the [inference] extra:

Copied

pip install --upgrade boincai_hub[inference]
# or
# pip install aiohttp

Copied

# Code must be run in a asyncio concurrent context.
# $ python -m asyncio
>>> from boincai_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> image = await client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

>>> async for token in await client.text_generation("The BOINC AI Hub is", stream=True):
...     print(token, end="")
 a platform for sharing and discussing ML-related content.

Advanced tips

Timeout

When doing inference, there are two main causes for a timeout:

  • The inference process takes a long time to complete.

  • The model is not available, for example when Inference API is loading it for the first time.

Copied

>>> from boincai_hub import InferenceClient, InferenceTimeoutError
>>> client = InferenceClient(timeout=30)
>>> try:
...     client.text_to_image(...)
... except InferenceTimeoutError:
...     print("Inference timed out after 30s.")

Binary inputs

  • raw bytes

  • a file-like object, opened as binary (with open("audio.flac", "rb") as f: ...)

  • a path (str or Path) pointing to a local file

  • a URL (str) pointing to a remote file (e.g. https://...). In this case, the file will be downloaded locally before sending it to the Inference API.

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]

Legacy InferenceAPI client

Initialization

Change from

Copied

>>> from boincai_hub import InferenceApi
>>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)

to

Copied

>>> from boincai_hub import InferenceClient
>>> inference = InferenceClient(model="bert-base-uncased", token=API_TOKEN)

Run on a specific task

Change from

Copied

>>> from boincai_hub import InferenceApi
>>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", task="feature-extraction")
>>> inference(...)

to

Copied

>>> from boincai_hub import InferenceClient
>>> inference = InferenceClient()
>>> inference.feature_extraction(..., model="paraphrase-xlm-r-multilingual-v1")

Run custom request

Change from

Copied

>>> from boincai_hub import InferenceApi
>>> inference = InferenceApi(repo_id="bert-base-uncased")
>>> inference(inputs="The goal of life is [MASK].")
[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]

to

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient()
>>> response = client.post(json={"inputs": "The goal of life is [MASK]."}, model="bert-base-uncased")
>>> response.json()
[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]

Run with parameters

Change from

Copied

>>> from boincai_hub import InferenceApi
>>> inference = InferenceApi(repo_id="typeform/distilbert-base-uncased-mnli")
>>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
>>> params = {"candidate_labels":["refund", "legal", "faq"]}
>>> inference(inputs, params)
{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}

to

Copied

>>> from boincai_hub import InferenceClient
>>> client = InferenceClient()
>>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
>>> params = {"candidate_labels":["refund", "legal", "faq"]}
>>> response = client.post(json={"inputs": inputs, "parameters": params}, model="typeform/distilbert-base-uncased-mnli")
>>> response.json()
{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}

The API is designed to be simple. Not all parameters and options are available or described for the end user. Check out if you are interested in learning more about all the parameters available for each task.

There are more than 200k models on the BOINC AI Hub! Each task in the comes with a recommended model. Be aware that the HF recommendation can change over time without prior notice. Therefore it is best to explicitly set a model once you are decided. Also, in most cases you’ll be interested in finding a model specific to your needs. Visit the page on the Hub to explore your possibilities.

The examples we saw above use the free-hosted Inference API. This proves to be very useful for prototyping and testing things quickly. Once you’re ready to deploy your model to production, you’ll need to use a dedicated infrastructure. That’s where comes into play. It allows you to deploy any model and expose it as a private API. Once deployed, you’ll get a URL that you can connect to using exactly the same code as before, changing only the model parameter:

Calls made with the can be authenticated using a . By default, it will use the token saved on your machine if you are logged in (check out ). If you are not logged in, you can pass your token as an instance parameter:

’s goal is to provide the easiest interface to run inference on BOINC AI models. It has a simple API that supports the most common tasks. Here is a list of the currently supported tasks:

Check out the page to learn more about each task, how to use them, and the most popular models for each task.

However, it is not always possible to cover all use cases. For custom requests, the method gives you the flexibility to send any request to the Inference API. For example, you can specify how to parse the inputs and outputs. In the example below, the generated image is returned as raw bytes instead of parsing it as a PIL Image. This can be helpful if you don’t have Pillow installed in your setup and just care about the binary content of the image. is also useful to handle tasks that are not yet officially supported.

After installation all async API endpoints are available via . Its initialization and APIs are strictly the same as the sync-only version.

For more information about the asyncio module, please refer to the .

In the above section, we saw the main aspects of . Let’s dive into some more advanced tips.

has a global timeout parameter to handle those two aspects. By default, it is set to None, meaning that the client will wait indefinitely for the inference to complete. If you want more control in your workflow, you can set it to a specific value in seconds. If the timeout delay expires, an is raised. You can catch it and handle it in your code:

Some tasks require binary inputs, for example, when dealing with images or audio files. In this case, tries to be as permissive as possible and accept different types:

acts as a replacement for the legacy client. It adds specific support for tasks and handles inference on both and .

Here is a short guide to help you migrate from to .

This is the recommended way to adapt your code to . It lets you benefit from the task-specific methods like feature_extraction.

🌍
Inference API
Inference Endpoints
InferenceClient
InferenceApi
Inference API
Inference Endpoints
Legacy InferenceAPI client
InferenceClient
Inference API
Inference Endpoints
JS client
C# project
InferenceClient
task
this page
InferenceClient
Models
Inference Endpoints
InferenceClient
User Access Token
how to login
InferenceClient
Tasks
InferenceClient.post()
InferenceClient.post()
AsyncInferenceClient
official documentation
InferenceClient
InferenceClient
InferenceTimeoutError
InferenceClient
InferenceClient
InferenceApi
Inference API
Inference Endpoints
InferenceApi
InferenceClient
InferenceClient
Audio Classification
audio_classification()
Automatic Speech Recognition
automatic_speech_recognition()
Text-to-Speech
text_to_speech()
Image Classification
image_classification()
Image Segmentation
image_segmentation()
Image-to-Image
image_to_image()
Image-to-Text
image_to_text()
Object Detection
object_detection()
Text-to-Image
text_to_image()
Zero-Shot-Image-Classification
zero_shot_image_classification()
Documentation Question Answering
document_question_answering()
Visual Question Answering
visual_question_answering()
Conversational
conversational()
Feature Extraction
feature_extraction()
Fill Mask
fill_mask()
Question Answering
question_answering()
Sentence Similarity
sentence_similarity()
Summarization
summarization()
Table Question Answering
table_question_answering()
Text Classification
text_classification()
Text Generation
text_generation()
Token Classification
token_classification()
Translation
translation()
Zero Shot Classification
zero_shot_classification()
Tabular Classification
tabular_classification()
Tabular Regression
tabular_regression()