Pricing
Easily deploy machine learning models on dedicated infrastructure with 🌍 Inference Endpoints. When you create an Endpoint, you can select the instance type to deploy and scale your model according to an hourly rate. 🌍 Inference Endpoints is accessible to BOINC AI accounts with an active subscription and credit card on file. At the end of the subscription period, the user or organization account will be charged for the compute resources used while Endpoints are initializing and in a running state.
You can find the hourly pricing for all available instances for 🌍 Inference Endpoints, and examples of how costs are calculated below. While the prices are shown by the hour, the actual cost is calculated by the minute.
CPU Instances
The table below shows currently available CPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to request a quota to use it.
aws
small
$0.06
1
2GB
Intel Xeon - Ice Lake
aws
medium
$0.12
2
4GB
Intel Xeon - Ice Lake
aws
large
$0.24
4
8GB
Intel Xeon - Ice Lake
aws
xlarge
$0.48
8
16GB
Intel Xeon - Ice Lake
azure
small
$0.06
1
2GB
Intel Xeon
azure
medium
$0.12
2
4GB
Intel Xeon
azure
large
$0.24
4
8GB
Intel Xeon
azure
xlarge
$0.48
8
16GB
Intel Xeon
GPU Instances
The table below shows currently available GPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to request a quota to use it.
aws
small
$0.60
1
14GB
NVIDIA T4
aws
medium
$1.30
1
24GB
NVIDIA A10G
aws
large
$4.50
4
56GB
NVIDIA T4
aws
xlarge
$6.50
1
80GB
NVIDIA A100
aws
xxlarge
$7.00
4
96GB
NVIDIA A10G
aws
2xlarge
$13.00
2
160GB
NVIDIA A100
aws
4xlarge
$26.00
4
320GB
NVIDIA A100
aws
8xlarge
$45.00
8
640GB
NVIDIA A100
Pricing examples
The following example pricing scenarios demonstrate how costs are calculated. You can find the hourly rate for all instance types and sizes in the tables above. Use the following formula to calculate the costs:
Copied
Basic Endpoint
AWS CPU medium (2 x 4GB vCPUs)
Autoscaling (minimum 1 replica, maximum 1 replica)
hourly cost
Copied
monthly cost
Copied
Advanced Endpoint
AWS GPU small (1 x 14GB GPU)
Autoscaling (minimum 1 replica, maximum 3 replica), every hour a spike in traffic scales the Endpoint from 1 to 3 replicas for 15 minutes
hourly cost
Copied
monthly cost
Copied
Last updated