BOINC AI Hub
  • 🌍BOINC AI Hub
  • 🌍Repositories
  • Getting Started with Repositories
  • Repository Settings
  • Pull Requests & Discussions
  • Notifications
  • Collections
  • 🌍Webhooks
    • How-to: Automatic fine-tuning with Auto-Train
    • How-to: Build a Discussion bot based on BLOOM
    • How-to: Create automatic metadata quality reports
  • Repository size recommendations
  • Next Steps
  • Licenses
  • 🌍Models
  • The Model Hub
  • 🌍Model Cards
    • Annotated Model Card
    • Carbon Emissions
    • Model Card Guidebook
    • Landscape Analysis
  • Gated Models
  • Uploading Models
  • Downloading Models
  • 🌍Integrated Libraries
    • Adapter Transformers
    • AllenNLP
    • Asteroid
    • Diffusers
    • ESPnet
    • fastai
    • Flair
    • Keras
    • ML-Agents
    • PaddleNLP
    • RL-Baselines3-Zoo
    • Sample Factory
    • Sentence Transformers
    • spaCy
    • SpanMarker
    • SpeechBrain
    • Stable-Baselines3
    • Stanza
    • TensorBoard
    • timm
    • Transformers
    • Transformers.js
  • 🌍Model Widgets
    • Widget Examples
  • Inference API docs
  • Frequently Asked Questions
  • 🌍Advanced Topics
    • Integrate a library with the Hub
    • Tasks
  • 🌍Datasets
  • Datasets Overview
  • Dataset Cards
  • Gated Datasets
  • Dataset Viewer
  • Using Datasets
  • Adding New Datasets
  • 🌍Spaces
  • 🌍Spaces Overview
    • Handling Spaces Dependencies
    • Spaces Settings
    • Using Spaces for Organization Cards
  • Spaces GPU Upgrades
  • Spaces Persistent Storage
  • Gradio Spaces
  • Streamlit Spaces
  • Static HTML Spaces
  • 🌍Docker Spaces
    • Your first Docker Spaces
    • Example Docker Spaces
    • Argilla on Spaces
    • Label Studio on Spaces
    • Aim on Space
    • Livebook on Spaces
    • Shiny on Spaces
    • ZenML on Spaces
    • Panel on Spaces
    • ChatUI on Spaces
    • Tabby on Spaces
  • Embed your Space
  • Run Spaces with Docker
  • Spaces Configuration Reference
  • Sign-In with BA button
  • Spaces Changelog
  • 🌍Advanced Topics
    • Using OpenCV in Spaces
    • More ways to create Spaces
    • Managing Spaces with Github Actions
    • Custom Python Spaces
    • How to Add a Space to ArXiv
    • Cookie limitations in Spaces
  • 🌍Other
  • 🌍Organizations
    • Managing Organizations
    • Organization Cards
    • Access Control in Organizations
  • Billing
  • 🌍Security
    • User Access Tokens
    • Git over SSH
    • Signing Commits with GPG
    • Single Sign-On (SSO)
    • Malware Scanning
    • Pickle Scanning
    • Secrets Scanning
  • Moderation
  • Paper Pages
  • Search
  • Digital Object Identifier (DOI)
  • Hub API Endpoints
  • Sign-In with BA
Powered by GitBook
On this page
  • Webhook guide: Setup an automatic system to re-train a model when a dataset changes
  • Prerequisite: Upload your dataset to the Hub
  • Create a Webhook to react to the dataset’s changes
  • Create a Space to react to your Webhook
  • Configure your Webhook to send events to your Space
  1. Webhooks

How-to: Automatic fine-tuning with Auto-Train

PreviousWebhooksNextHow-to: Build a Discussion bot based on BLOOM

Last updated 1 year ago

Webhook guide: Setup an automatic system to re-train a model when a dataset changes

Webhooks are now publicly available!

This guide will help walk you through the setup of an automatic training pipeline on the BOINC AI platform using BA Datasets, Webhooks, Spaces, and AutoTrain.

We will build a Webhook that listens to changes on an image classification dataset and triggers a fine-tuning of using .

Prerequisite: Upload your dataset to the Hub

We will use a for the sake of the example. Learn more about uploading your data to the Hub .

Create a Webhook to react to the dataset’s changes

First, let’s create a Webhook from your .

  • Select your dataset as the target repository. We will target in this example.

  • You can put a dummy Webhook URL for now. Defining your Webhook will let you look at the events that will be sent to it. You can also replay them, which will be useful for debugging!

  • Input a secret to make it more secure.

  • Subscribe to “Repo update” events as we want to react to data changes

Your Webhook will look like this:

Create a Space to react to your Webhook

  1. It spawns a FastAPI app that will listen to HTTP POST requests on /webhook:

Copied

from fastapi import FastAPI

# [...]
@app.post("/webhook")
async def post_webhook(
	# ...
):

# ...
    1. This route checks that the X-Webhook-Secret header is present and that its value is the same as the one you set in your Webhook’s settings. The WEBHOOK_SECRET secret must be set in the Space’s settings and be the same as the secret set in your Webhook.

Copied

# [...]

WEBHOOK_SECRET = os.getenv("WEBHOOK_SECRET")

# [...]

@app.post("/webhook")
async def post_webhook(
	# [...]
	x_webhook_secret:  Optional[str] = Header(default=None),
	# ^ checks for the X-Webhook-Secret HTTP header
):
	if x_webhook_secret is None:
		raise HTTPException(401)
	if x_webhook_secret != WEBHOOK_SECRET:
		raise HTTPException(403)
	# [...]
  1. The event’s payload is encoded as JSON. Here, we’ll be using pydantic models to parse the event payload. We also specify that we will run our Webhook only when:

  • the event concerns the input dataset

  • the event is an update on the repo’s content, i.e., there has been a new commit

Copied

# defined in src/models.py
class WebhookPayloadEvent(BaseModel):
	action: Literal["create", "update", "delete"]
	scope: str

class WebhookPayloadRepo(BaseModel):
	type: Literal["dataset", "model", "space"]
	name: str
	id: str
	private: bool
	headSha: str

class WebhookPayload(BaseModel):
	event: WebhookPayloadEvent
	repo: WebhookPayloadRepo

# [...]

@app.post("/webhook")
async def post_webhook(
	# [...]
	payload: WebhookPayload,
	# ^ Pydantic model defining the payload format
):
	# [...]
	if not (
		payload.event.action == "update"
		and payload.event.scope.startswith("repo.content")
		and payload.repo.name == config.input_dataset
		and payload.repo.type == "dataset"
	):
		# no-op if the payload does not match our expectations
		return {"processed": False}
	#[...]
  1. If the payload is valid, the next step is to create a project on AutoTrain, schedule a fine-tuning of the input model (microsoft/resnet-50 in our example) on the input dataset, and create a discussion on the dataset when it’s done!

Copied

def schedule_retrain(payload: WebhookPayload):
	# Create the autotrain project
	try:
		project = AutoTrain.create_project(payload)
		AutoTrain.add_data(project_id=project["id"])
		AutoTrain.start_processing(project_id=project["id"])
	except requests.HTTPError as err:
		print("ERROR while requesting AutoTrain API:")
		print(f"  code: {err.response.status_code}")
		print(f"  {err.response.json()}")
		raise
	# Notify in the community tab
	notify_success(project["id"])

Visit the link inside the comment to review the training cost estimate, and start fine-tuning the model!

In this example, we used Hugging Face AutoTrain to fine-tune our model quickly, but you can of course plug in your training infrastructure!

Feel free to duplicate the Space to your personal namespace and play with it. You will need to provide two secrets:

  • WEBHOOK_SECRET : the secret from your Webhook.

Copied

{
	"target_namespace": "the namespace where the trained model should end up",
	"input_dataset": "the dataset on which the model will be trained",
	"input_model": "the base model to re-train",
	"autotrain_project_prefix": "A prefix for the AutoTrain project"
}

Configure your Webhook to send events to your Space

Last but not least, you’ll need to configure your webhook to send POST requests to your Space.

Let’s first grab our Space’s “direct URL” from the contextual menu. Click on “Embed this Space” and copy the “Direct URL”.

Update your Webhook to send requests to that URL:

And that’s it! Now every commit to the input dataset will trigger a fine-tuning of ResNet-50 with AutoTrain 🎉

We now need a way to react to your Webhook events. An easy way to do this is to use a !

You can find an example Space .

This Space uses Docker, Python, , and to run a simple HTTP server. Read more about Docker Spaces .

The entry point is . Let’s walk through this file and detail what it does:

community tab notification

HF_ACCESS_TOKEN : a User Access Token with write rights. You can create one .

You will also need to tweak the to use the dataset and model of you choice:

embed this Space
direct URL
webhook settings
🌍
microsoft/resnet-50
AutoTrain
simple image classification dataset
here
settings
boincai-projects/input-dataset
Space
here
FastAPI
uvicorn
here
src/main.py
from your settings
config.json file