Quick Tour
Last updated
Last updated
The easiest way of getting started is using the official Docker container. Install Docker following .
Let’s say you want to deploy model with TGI. Here is an example on how to do that:
Copied
To use GPUs, you need to install the . We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
Once TGI is running, you can use the generate
endpoint by doing requests. To learn more about how to query the endpoints, check the section, where we show examples with utility libraries and UIs. Below you can see a simple snippet to query the endpoint.
PythonJavaScriptcURLCopied
To see all possible deploy flags and options, you can use the --help
flag. It’s possible to configure the number of shards, quantization, generation parameters, and more.
Copied