Pricing

Easily deploy machine learning models on dedicated infrastructure with 🌍 Inference Endpoints. When you create an Endpoint, you can select the instance type to deploy and scale your model according to an hourly rate. 🌍 Inference Endpoints is accessible to BOINC AI accounts with an active subscription and credit card on file. At the end of the subscription period, the user or organization account will be charged for the compute resources used while Endpoints are initializing and in a running state.

You can find the hourly pricing for all available instances for 🌍 Inference Endpoints, and examples of how costs are calculated below. While the prices are shown by the hour, the actual cost is calculated by the minute.

CPU Instances

The table below shows currently available CPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to request a quota to use it.

Provider

Instance Size

hourly rate

vCPUs

Memory

Architecture

aws

small

$0.06

2GB

Intel Xeon - Ice Lake

aws

medium

$0.12

4GB

Intel Xeon - Ice Lake

aws

large

$0.24

8GB

Intel Xeon - Ice Lake

aws

xlarge

$0.48

16GB

Intel Xeon - Ice Lake

azure

small

$0.06

2GB

Intel Xeon

azure

medium

$0.12

4GB

Intel Xeon

azure

large

$0.24

8GB

Intel Xeon

azure

xlarge

$0.48

16GB

Intel Xeon

GPU Instances

The table below shows currently available GPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to request a quota to use it.

Provider

Instance Size

hourly rate

GPUs

Memory

Architecture

aws

small

$0.60

14GB

NVIDIA T4

aws

medium

$1.30

24GB

NVIDIA A10G

aws

large

$4.50

56GB

NVIDIA T4

aws

xlarge

$6.50

80GB

NVIDIA A100

aws

xxlarge

$7.00

96GB

NVIDIA A10G

aws

2xlarge

$13.00

160GB

NVIDIA A100

aws

4xlarge

$26.00

320GB

NVIDIA A100

aws

8xlarge

$45.00

640GB

NVIDIA A100

Pricing examples

The following example pricing scenarios demonstrate how costs are calculated. You can find the hourly rate for all instance types and sizes in the tables above. Use the following formula to calculate the costs:

Copied

instance hourly rate * ((hours * # min replica) + (scale-up hrs * # additional replicas))

Basic Endpoint

AWS CPU medium (2 x 4GB vCPUs)
Autoscaling (minimum 1 replica, maximum 1 replica)

hourly cost

Copied

instance hourly rate * (hours * # min replica) = hourly cost
$0.12/hr * (1hr * 1 replica) = $0.12/hr

monthly cost

Copied

instance hourly rate * (hours * # min replica) = monthly cost
$0.12/hr * (730hr * 1 replica) = $87.6/month

Advanced Endpoint

AWS GPU small (1 x 14GB GPU)
Autoscaling (minimum 1 replica, maximum 3 replica), every hour a spike in traffic scales the Endpoint from 1 to 3 replicas for 15 minutes

hourly cost

Copied

instance hourly rate * ((hours * # min replica) + (scale-up hrs * # additional replicas)) = hourly cost
$0.6/hr * ((1hr * 1 replica) + (0.25hr * 2 replicas)) = $0.9/hr

monthly cost

Copied

instance hourly rate * ((hours * # min replica) + (scale-up hrs * # additional replicas)) = monthly cost
$0.6/hr * ((730hr * 1 replica) + (182.5hr * 2 replicas)) = $657/month

PreviousAutoscaling NextHelp & Support

Last updated 2 years ago

hashtagCPU Instances

hashtagGPU Instances

hashtagPricing examples

hashtagBasic Endpoint

hashtagAdvanced Endpoint

CPU Instances

GPU Instances

Pricing examples

Basic Endpoint

Advanced Endpoint