How-to: Automatic fine-tuning with Auto-Train
Webhook guide: Setup an automatic system to re-train a model when a dataset changes
Webhooks are now publicly available!
This guide will help walk you through the setup of an automatic training pipeline on the BOINC AI platform using BA Datasets, Webhooks, Spaces, and AutoTrain.
We will build a Webhook that listens to changes on an image classification dataset and triggers a fine-tuning of microsoft/resnet-50 using AutoTrain.
Prerequisite: Upload your dataset to the Hub
We will use a simple image classification dataset for the sake of the example. Learn more about uploading your data to the Hub here.
Create a Webhook to react to the datasetβs changes
First, letβs create a Webhook from your settings.
Select your dataset as the target repository. We will target boincai-projects/input-dataset in this example.
You can put a dummy Webhook URL for now. Defining your Webhook will let you look at the events that will be sent to it. You can also replay them, which will be useful for debugging!
Input a secret to make it more secure.
Subscribe to βRepo updateβ events as we want to react to data changes
Your Webhook will look like this:
Create a Space to react to your Webhook
We now need a way to react to your Webhook events. An easy way to do this is to use a Space!
You can find an example Space here.
This Space uses Docker, Python, FastAPI, and uvicorn to run a simple HTTP server. Read more about Docker Spaces here.
The entry point is src/main.py. Letβs walk through this file and detail what it does:
It spawns a FastAPI app that will listen to HTTP
POST
requests on/webhook
:
Copied
This route checks that the
X-Webhook-Secret
header is present and that its value is the same as the one you set in your Webhookβs settings. TheWEBHOOK_SECRET
secret must be set in the Spaceβs settings and be the same as the secret set in your Webhook.
Copied
The eventβs payload is encoded as JSON. Here, weβll be using pydantic models to parse the event payload. We also specify that we will run our Webhook only when:
the event concerns the input dataset
the event is an update on the repoβs content, i.e., there has been a new commit
Copied
If the payload is valid, the next step is to create a project on AutoTrain, schedule a fine-tuning of the input model (
microsoft/resnet-50
in our example) on the input dataset, and create a discussion on the dataset when itβs done!
Copied
Visit the link inside the comment to review the training cost estimate, and start fine-tuning the model!
In this example, we used Hugging Face AutoTrain to fine-tune our model quickly, but you can of course plug in your training infrastructure!
Feel free to duplicate the Space to your personal namespace and play with it. You will need to provide two secrets:
WEBHOOK_SECRET
: the secret from your Webhook.HF_ACCESS_TOKEN
: a User Access Token withwrite
rights. You can create one from your settings.
You will also need to tweak the config.json
file to use the dataset and model of you choice:
Copied
Configure your Webhook to send events to your Space
Last but not least, youβll need to configure your webhook to send POST requests to your Space.
Letβs first grab our Spaceβs βdirect URLβ from the contextual menu. Click on βEmbed this Spaceβ and copy the βDirect URLβ.
Update your Webhook to send requests to that URL:
And thatβs it! Now every commit to the input dataset will trigger a fine-tuning of ResNet-50 with AutoTrain π
Last updated