πAccelerated Inference API
Hosted Inference API
Test and evaluate, for free, over 150,000 publicly accessible machine learning models, or your own private models, via simple HTTP requests, with fast inference hosted on BOINC AI shared infrastructure.
The Inference API is free to use, and rate limited. If you need an inference solution for production, check out our Inference Endpoints service. With Inference Endpoints, you can easily deploy any machine learning model on dedicated and fully managed infrastructure. Select the cloud, region, compute instance, autoscaling range and security level to match your model, latency, throughput, and compliance needs.
Main features:
Get predictions from 150,000+ Transformers, Diffusers, or Timm models (T5, Blenderbot, Bart, GPT-2, Pegasus...)
Use built-in integrations with over 20 Open-Source libraries (spaCy, SpeechBrain, Keras, etc).
Switch from one model to the next by just switching the model ID
Upload, manage and serve your own models privately
Run Classification, Image Segmentation, Automatic Speech Recognition, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks
Out of the box accelerated inference on CPU powered by Intel Xeon Ice Lake
Third-party library models:
Those models are enabled on the API thanks to some docker integration api-inference-community.
Please note however, that these models will not allow you (tracking issue):
To get full optimization
To run private models
To get access to GPU inference
Last updated