Installation
Installation
This section explains how to install the CLI tool as well as installing TGI from source. The strongly recommended approach is to use Docker, as it does not require much setup. Check the Quick Tour to learn how to run TGI with Docker.
Install CLI
You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters.
To install the CLI, you need to first clone the TGI repository and then run make
.
Copied
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
make install
If you would like to serve models with custom kernels, run
Copied
BUILD_EXTENSIONS=True make install
Local Installation from Source
Before you start, you will need to setup your environment, and install Text Generation Inference. Text Generation Inference is tested on Python 3.9+.
Text Generation Inference is available on pypi, conda and GitHub.
To install and launch locally, first install Rust and create a Python virtual environment with at least Python 3.9, e.g. using conda:
Copied
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
conda create -n text-generation-inference python=3.9
conda activate text-generation-inference
You may also need to install Protoc.
On Linux:
Copied
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
On MacOS, using Homebrew:
Copied
brew install protobuf
Then run to install Text Generation Inference:
Copied
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
BUILD_EXTENSIONS=True make install
On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
Copied
sudo apt-get install libssl-dev gcc -y
Once installation is done, simply run:
Copied
make run-falcon-7b-instruct
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
Last updated