Use fast tokenizers from BOINC AI Tokenizers
The PreTrainedTokenizerFast depends on the π Tokenizers library. The tokenizers obtained from the π Tokenizers library can be loaded very simply into π Transformers.
Before getting in the specifics, letβs first start by creating a dummy tokenizer in a few lines:
Copied
We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to a JSON file for future re-use.
Loading directly from the tokenizer object
Letβs see how to leverage this tokenizer object in the π Transformers library. The PreTrainedTokenizerFast class allows for easy instantiation, by accepting the instantiated tokenizer object as an argument:
Copied
This object can now be used with all the methods shared by the π Transformers tokenizers! Head to the tokenizer page for more information.
Loading from a JSON file
In order to load a tokenizer from a JSON file, letβs first start by saving our tokenizer:
Copied
The path to which we saved this file can be passed to the PreTrainedTokenizerFast initialization method using the tokenizer_file
parameter:
Copied
This object can now be used with all the methods shared by the π Transformers tokenizers! Head to the tokenizer page for more information.
Last updated