Custom Tools and Prompts
Last updated
Last updated
If you are not aware of what tools and agents are in the context of transformers, we recommend you read the page first.
Transformers Agents is an experimental API that is subject to change at any time. Results returned by the agents can vary as the APIs or underlying models are prone to change.
Creating and using custom tools and prompts is paramount to empowering the agent and having it perform new tasks. In this guide weโll take a look at:
How to customize the prompt
How to use custom tools
How to create custom tools
As explained in agents can run in and mode. Both the run
and chat
modes underlie the same logic. The language model powering the agent is conditioned on a long prompt and completes the prompt by generating the next tokens until the stop token is reached. The only difference between the two modes is that during the chat
mode the prompt is extended with previous user inputs and model generations. This allows the agent to have access to past interactions, seemingly giving the agent some kind of memory.
Letโs take a closer look at how the prompt is structured to understand how it can be best customized. The prompt is structured broadly into four parts.
Introduction: how the agent should behave, explanation of the concept of tools.
Description of all the tools. This is defined by a <<all_tools>>
token that is dynamically replaced at runtime with the tools defined/chosen by the user.
A set of examples of tasks and their solution
Current example, and request for solution.
To better understand each part, letโs look at a shortened version of how the run
prompt can look like:
Copied
The introduction (the text before โTools:โ) explains precisely how the model shall behave and what it should do. This part most likely does not need to be customized as the agent shall always behave the same way.
The second part (the bullet points below โToolsโ) is dynamically added upon calling run
or chat
. There are exactly as many bullet points as there are tools in agent.toolbox
and each bullet point consists of the name and description of the tool:
Copied
Letโs verify this quickly by loading the document_qa tool and printing out the name and description.
Copied
which gives:
Copied
We can see that the tool name is short and precise. The description includes two parts, the first explaining what the tool does and the second states what input arguments and return values are expected.
A good tool name and tool description are very important for the agent to correctly use it. Note that the only information the agent has about the tool is its name and description, so one should make sure that both are precisely written and match the style of the existing tools in the toolbox. In particular make sure the description mentions all the arguments expected by name in code-style, along with the expected type and a description of what they are.
Check the naming and description of the curated Transformers tools to better understand what name and description a tool is expected to have. You can see all tools with the Agent.toolbox
property.
The third part includes a set of curated examples that show the agent exactly what code it should produce for what kind of user request. The large language models empowering the agent are extremely good at recognizing patterns in a prompt and repeating the pattern with new data. Therefore, it is very important that the examples are written in a way that maximizes the likelihood of the agent to generating correct, executable code in practice.
Letโs have a look at one example:
Copied
The pattern the model is prompted to repeat has three parts: The task statement, the agentโs explanation of what it intends to do, and finally the generated code. Every example that is part of the prompt has this exact pattern, thus making sure that the agent will reproduce exactly the same pattern when generating new tokens.
The final part of the prompt corresponds to:
Copied
is a final and unfinished example that the agent is tasked to complete. The unfinished example is dynamically created based on the actual user input. For the above example, the user ran:
Copied
The user input - a.k.a the task: โDraw me a picture of rivers and lakesโ is cast into the prompt template: โTask: <task> \n\n I will use the followingโ. This sentence makes up the final lines of the prompt the agent is conditioned on, therefore strongly influencing the agent to finish the example exactly in the same way it was previously done in the examples.
Without going into too much detail, the chat template has the same prompt structure with the examples having a slightly different style, e.g.:
Copied
Contrary, to the examples of the run
prompt, each chat
prompt example has one or more exchanges between the Human and the Assistant. Every exchange is structured similarly to the example of the run
prompt. The userโs input is appended to behind Human: and the agent is prompted to first generate what needs to be done before generating code. An exchange can be based on previous exchanges, therefore allowing the user to refer to past exchanges as is done e.g. above by the userโs input of โI tried this codeโ refers to the previously generated code of the agent.
Upon running .chat
, the userโs input or task is cast into an unfinished example of the form:
Copied
which the agent completes. Contrary to the run
command, the chat
command then appends the completed example to the prompt, thus giving the agent more context for the next chat
turn.
Great now that we know how the prompt is structured, letโs see how we can customize it!
While large language models are getting better and better at understanding usersโ intentions, it helps enormously to be as precise as possible to help the agent pick the correct task. What does it mean to be as precise as possible?
The agent sees a list of tool names and their description in its prompt. The more tools are added the more difficult it becomes for the agent to choose the correct tool and itโs even more difficult to choose the correct sequences of tools to run. Letโs look at a common failure case, here we will only return the code to analyze it.
Copied
gives:
Copied
which is probably not what we wanted. Instead, it is more likely that we want an image of a tree to be generated. To steer the agent more towards using a specific tool it can therefore be very helpful to use important keywords that are present in the toolโs name and description. Letโs have a look.
Copied
Copied
The name and description make use of the keywords โimageโ, โpromptโ, โcreateโ and โgenerateโ. Using these words will most likely work better here. Letโs refine our prompt a bit.
Copied
gives:
Copied
Much better! That looks more like what we want. In short, when you notice that the agent struggles to correctly map your task to the correct tools, try looking up the most pertinent keywords of the toolโs name and description and try refining your task request with it.
As weโve seen before the agent has access to each of the toolsโ names and descriptions. The base tools should have very precise names and descriptions, however, you might find that it could help to change the the description or name of a tool for your specific use case. This might become especially important when youโve added multiple tools that are very similar or if you want to use your agent only for a certain domain, e.g. image generation and transformations.
A common problem is that the agent confuses image generation with image transformation/modification when used a lot for image generation tasks, e.g.
Copied
returns
Copied
which is probably not exactly what we want here. It seems like the agent has a difficult time to understand the difference between image_generator
and image_transformer
and often uses the two together.
We can help the agent here by changing the tool name and description of image_transformer
. Letโs instead call it modifier
to disassociate it a bit from โimageโ and โpromptโ:
Copied
Now โmodifyโ is a strong cue to use the new image processor which should help with the above prompt. Letโs run it again.
Copied
Now weโre getting:
Copied
which is definitely closer to what we had in mind! However, we want to have both the house and car in the same image. Steering the task more toward single image generation should help:
Copied
Copied
Agents are still brittle for many use cases, especially when it comes to slightly more complex use cases like generating an image of multiple objects. Both the agent itself and the underlying prompt will be further improved in the coming months making sure that agents become more robust to a variety of user inputs.
Copied
Please make sure to have the <<all_tools>>
string and the <<prompt>>
defined somewhere in the template
so that the agent can be aware of the tools, it has available to it as well as correctly insert the userโs prompt.
Similarly, one can overwrite the chat
prompt template. Note that the chat
mode always uses the following format for the exchanges:
Copied
Therefore it is important that the examples of the custom chat
prompt template also make use of this format. You can overwrite the chat
template at instantiation as follows.
Copied
Please make sure to have the <<all_tools>>
string defined somewhere in the template
so that the agent can be aware of the tools, it has available to it.
To upload your custom prompt on a repo on the Hub and share it with the community just make sure:
to use a dataset repository
to put the prompt template for the run
command in a file named run_prompt_template.txt
to put the prompt template for the chat
command in a file named chat_prompt_template.txt
In this section, weโll be leveraging two existing custom tools that are specific to image generation:
Copied
Upon adding custom tools to an agent, the toolsโ descriptions and names are automatically included in the agentsโ prompts. Thus, it is imperative that custom tools have a well-written description and name in order for the agent to understand how to use them. Letโs take a look at the description and name of controlnet_transformer
:
Copied
gives
Copied
Copied
This command should give you the following info:
Copied
The set of curated tools already has an image_transformer
tool which is hereby replaced with our custom tool.
Overwriting existing tools can be beneficial if we want to use a custom tool exactly for the same task as an existing tool because the agent is well-versed in using the specific task. Beware that the custom tool should follow the exact same API as the overwritten tool in this case, or you should adapt the prompt template to make sure all examples using that tool are updated.
The upscaler tool was given the name image_upscaler
which is not yet present in the default toolbox and is therefore simply added to the list of tools. You can always have a look at the toolbox that is currently available to the agent via the agent.toolbox
attribute:
Copied
Copied
Note how image_upscaler
is now part of the agentsโ toolbox.
Copied
Letโs transform the image into a beautiful winter landscape:
Copied
Copied
The new image processing tool is based on ControlNet which can make very strong modifications to the image. By default the image processing tool returns an image of size 512x512 pixels. Letโs see if we can upscale it.
Copied
Copied
The agent automatically mapped our prompt โUpscale the imageโ to the just added upscaler tool purely based on the description and name of the upscaler tool and was able to correctly run it.
Next, letโs have a look at how you can create a new custom tool.
In this section, we show how to create a new tool that can be added to the agent.
Creating a new tool
Weโll first start by creating a tool. Weโll add the not-so-useful yet fun task of fetching the model on the BOINC AI Hub with the most downloads for a given task.
We can do that with the following code:
Copied
For the task text-classification
, this returns 'facebook/bart-large-mnli'
, for translation
it returns 't5-base
.
How do we convert this to a tool that the agent can leverage? All tools depend on the superclass Tool
that holds the main attributes necessary. Weโll create a class that inherits from it:
Copied
This class has a few needs:
An attribute name
, which corresponds to the name of the tool itself. To be in tune with other tools which have a performative name, weโll name it model_download_counter
.
An attribute description
, which will be used to populate the prompt of the agent.
inputs
and outputs
attributes. Defining this will help the python interpreter make educated choices about types, and will allow for a gradio-demo to be spawned when we push our tool to the Hub. Theyโre both a list of expected values, which can be text
, image
, or audio
.
A __call__
method which contains the inference code. This is the code weโve played with above!
Hereโs what our class looks like now:
Copied
We now have our tool handy. Save it in a file and import it from your main script. Letโs name this file model_downloads.py
, so the resulting import code looks like this:
Copied
In order to let others benefit from it and for simpler initialization, we recommend pushing it to the Hub under your namespace. To do so, just call push_to_hub
on the tool
variable:
Copied
You now have your code on the Hub! Letโs take a look at the final step, which is to have the agent use it.
Having the agent use the tool
We now have our tool that lives on the Hub which can be instantiated as such (change the user name for your tool):
Copied
In order to use it in the agent, simply pass it in the additional_tools
parameter of the agent initialization method:
Copied
which outputs the following:
Copied
and generates the following audio.
Audio
Depending on the LLM, some are quite brittle and require very exact prompts in order to work well. Having a well-defined name and description of the tool is paramount to having it be leveraged by the agent.
Replacing existing tools can be done simply by assigning a new item to the agentโs toolbox. Hereโs how one would do so:
Copied
Beware when replacing tools with others! This will also adjust the agentโs prompt. This can be good if you have a better prompt suited for the task, but it can also result in your tool being selected way more than others or for other tools to be selected instead of the one you have defined.
We offer support for gradio_tools
by using the Tool.from_gradio
method. For example, we want to take advantage of the StableDiffusionPromptGeneratorTool
tool offered in the gradio-tools
toolkit so as to improve our prompts and generate better images.
We first import the tool from gradio_tools
and instantiate it:
Copied
We pass that instance to the Tool.from_gradio
method:
Copied
Now we can manage it exactly as we would a usual custom tool. We leverage it to improve our prompt a rabbit wearing a space suit
:
Copied
The model adequately leverages the tool:
Copied
Before finally generating the image:
gradio-tools requires textual inputs and outputs, even when working with different modalities. This implementation works with image and audio objects. The two are currently incompatible, but will rapidly become compatible as we work to improve the support.
We love Langchain and think it has a very compelling suite of tools. In order to handle these tools, Langchain requires textual inputs and outputs, even when working with different modalities. This is often the serialized version (i.e., saved to disk) of the objects.
This difference means that multi-modality isnโt handled between transformers-agents and langchain. We aim for this limitation to be resolved in future versions, and welcome any help from avid langchain users to help us achieve this compatibility.
The prompt examples are curated by the Transformers team and rigorously evaluated on a set of to ensure that the agentโs prompt is as good as possible to solve real use cases of the agent.
To give the user maximum flexibility, the whole prompt template as explained in can be overwritten by the user. In this case make sure that your custom prompt includes an introduction section, a tool section, an example section, and an unfinished example section. If you want to overwrite the run
prompt template, you can do as follows:
In both cases, you can pass a repo ID instead of the prompt template if you would like to use a template hosted by someone in the community. The default prompts live in as an example.
We replace , with to allow for more image modifications.
We add a new tool for image upscaling to the default toolbox: replace the existing image-transformation tool.
Weโll start by loading the custom tools with the convenient function:
The name and description are accurate and fit the style of the . Next, letโs instantiate an agent with controlnet_transformer
and upscaler
:
Letโs now try out the new tools! We will re-use the image we generated in .
is a powerful library that allows using BOINC AI Spaces as tools. It supports many existing Spaces as well as custom Spaces to be designed with it.
We would love to have better support. If you would like to help, please and share what you have in mind.