Use fast tokenizers from BOINC AI Tokenizers
Last updated
Last updated
The depends on the π library. The tokenizers obtained from the π Tokenizers library can be loaded very simply into π Transformers.
Before getting in the specifics, letβs first start by creating a dummy tokenizer in a few lines:
Copied
We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to a JSON file for future re-use.
Letβs see how to leverage this tokenizer object in the π Transformers library. The class allows for easy instantiation, by accepting the instantiated tokenizer object as an argument:
Copied
In order to load a tokenizer from a JSON file, letβs first start by saving our tokenizer:
Copied
Copied
This object can now be used with all the methods shared by the π Transformers tokenizers! Head to for more information.
The path to which we saved this file can be passed to the initialization method using the tokenizer_file
parameter:
This object can now be used with all the methods shared by the π Transformers tokenizers! Head to for more information.