Argilla on Spaces
Last updated
Last updated
Argilla is an open-source, data labelling tool, for highly efficient human-in-the-loop and MLOps workflows. Argilla is composed of (1) a server and webapp for data labelling, and curation, and (2) a Python SDK for building data annotation workflows in Python. Argilla nicely integrates with the Hugging Face stack (datasets
, transformers
, hub
, and setfit
), and now it can also be deployed using the Hubβs Docker Spaces.
Visit the Argilla documentation to learn about its features and check out the Deep Dive Guides and Tutorials.
In the next sections, youβll learn to deploy your own Argilla app and use it for data labelling workflows right from the Hub. This Argilla app is a self-contained application completely hosted on the Hub using Docker. The diagram below illustrates the complete process.
You can deploy Argilla on Spaces with just a few clicks:
**IMPORTANT NOTE ABOUT DATA PERSISTENCE:** You can use the Argilla Quickstart Space as is for initial exploration and experimentation. For **longer use in small-scale projects, activate the paid persistent storage option**. This prevents data loss during Space restarts every 24 hours. If not using persistent storage, safeguard your data with Argilla Python SDK by storing it elsewhere. In this case, we gently remind you that the responsibility for maintaining your data's safety becomes yours.
You need to define the Owner (your personal account or an organization), a Space name, and the Visibility. To interact with the Argilla app with Python, you need to set up the visibility to Public
.
If you want to customize the title, emojis, and colors of your space, go to "Files and Versions" and edit the metadata of your README.md file.
Once you have created the Space, youβll see the Building
status and once it becomes Running
your space is ready to go if you donβt see the Argilla login UI refresh the page.
The Space is configured with two users: argilla and admin with the same default password: 12345678. If you get a 500 error after login, make sure you have correctly introduced the user and password. To secure your Space, you can change the passwords and API keys using secret variables as explained in the next section.
For quick experimentation, you can jump directly into the next section. If you want to secure your space and for longer-term usage, setting up secret variables is recommended.## Setting up secret environment variables
The Space template provides a way to set up different optional settings focusing on securing your Argilla Space.
To set up these secrets, you can go to the Settings tab on your created Space. Make sure to save these values somewhere for later use.
The template Space has two users: admin
and argilla
. The username admin
corresponds to the root user, who can upload datasets and access any workspace within your Argilla Space. The username argilla
is a normal user with access to the argilla
workspace.
The usernames, passwords, and API keys to upload, read, update, and delete datasets can be configured using the following secrets:
ADMIN_USERNAME
: The admin username to log in Argilla. The default admin username is admin
. By setting up a custom username you can use your own username to log in to the app.
ADMIN_API_KEY
: Argilla provides a Python library to interact with the app (read, write, and update data, log model predictions, etc.). If you donβt set this variable, the library and your app will use the default API key i.e. admin.apikey
. If you want to secure your app for reading and writing data, we recommend you to set up this variable. The API key can be any string of your choice. You can check an online generator if you like.
ADMIN_PASSWORD
: This sets a custom password to log in to the app with the argilla
username. The default password is 12345678
. By setting up a custom password you can use your own password to log in to the app.
ANNOTATOR_USERNAME
: The annotator username to log in to Argilla. The default annotator username is argilla
. By setting up a custom username you can use your own username to log in to the app.
ANNOTATOR_PASSWORD
: This sets a custom password to log in to the app with the argilla
username. The default password is 12345678
. By setting up a custom password you can use your own password to log in to the app.
The combination of these secret variables gives you the following setup options:
I want to avoid that anyone without the API keys adds, deletes, or updates datasets using the Python client: You need to setup ADMIN_PASSWORD
and ADMIN_API_KEY
.
Additionally, I want to avoid that the argilla
username deletes datasets from the UI: You need to setup ANNOTATOR_PASSWORD
and use the argilla
generated API key with the Python Client (check your Space logs). This option might be interesting if you want to control dataset management but want anyone to browse your datasets using the argilla
user.
Additionally, I want to avoid that anyone without password browses my datasets with the argilla
user: You need to setup ANNOTATOR_PASSWORD
. In this case, you can use the argilla
generated API key and/or ADMIN_API_KEY
values with the Python Client depending on your needs for dataset deletion rights.
Additionally, the LOAD_DATASETS
will let you configure the sample datasets that will be pre-loaded. The default value is single
and the supported values for this variable are:
single
: Load single datasets for TextClassification task.
full
: Load all the sample datasets for NLP tasks (TokenClassification, TextClassification, Text2Text)
none
: No datasets being loaded.
Once your Argilla Space is running:
You need to find the Space Direct URL under the βEmbed this Spaceβ option (top right, see screenshot below).
This URL gives you access to a full-screen Argilla UI for data labelling. The Direct URL is the api_url parameter for connecting the argilla Python client in order to read and write data programmatically.
You are now ready to upload your first dataset into Argilla.
For uploading Argilla datasets, there are two options:
You can use the argilla Python library inside Jupyter, Colab, VS Code, or other Python IDE. In this case, you will read read your source file (csv
, json
, etc.) and transform it into Argilla records. We recommend to read the basics guide.
You can use the no-code data manager app to upload a file and log it into Argilla. If you need to transform your dataset before uploading it into Argilla, we recommend the first option.
To follow a complete tutorial with Colab or Jupyter, check this tutorial. For a quick step-by-step example using the argilla
Python library, keep reading.
First, you need to open a Python IDE, we highly recommend using Jupyter notebooks or Colab.
Second, you need to pip
install datasets
and argilla
on Colab or your local machine:
Copied
Third, you need to read the dataset using the datasets
library. For reading other file types, check the basics guide.
Copied
Fourth, you need to init the argilla
client with your Space URL and API key and upload the records into Argilla:
Copied
Congrats! Your dataset is available in the Argilla UI for data labeling. Once you have labelled some data, you can train your first model by reading the dataset using Python.
In this example, we use SetFit, but you can use any other model for training.
To train a model using your labeled data, you need to read the labelled dataset and prepare it for training:
Copied
To train a SetFit model with this dataset:
Copied
Optionally, you can push the dataset to the Hub for later use:
Copied
As a next step, check out the Argilla Tutorials section. All the tutorials can be run using Colab or local Jupyter Notebooks.
If you have suggestions or need specific support, please join Argilla Slack community or reach out on Argillaβs GitHub repository.
Argilla Datasets cannot be uploaded directly from the UI. Most Argilla users upload datasets programmatically using the argilla Python library but you can also use Argilla Data Manager, a simple Streamlit app.