Gated Datasets
Last updated
Last updated
To give dataset creators more control over how their datasets are used, the Hub allows users to enable User Access requests through a dataset’s Settings tab. Enabling this setting requires users to agree to share their contact information and accept the dataset authors’ terms and conditions in order to access the dataset. The contact information is stored in a database, and dataset owners are able to download a copy of the user access report.
The User Access request dialog can be modified to include additional text and checkbox fields in the prompt. To do this, add a YAML section to the dataset’s README.md
file (create one if it does not already exist) and add an extra_gated_fields
property. Within this property, you’ll be able to add as many custom fields as you like and whether they are a text
or checkbox
field. An extra_gated_prompt
property can also be included to add a customized text message.
Copied
The README.md
file for a dataset is called a Dataset Card. Visit the documentation to learn more about how to use it and to see the properties that you can configure.
By default, requests to access the dataset are automatically accepted. Dataset authors can set the approval mode to “Manual reviews” from the dataset’s Settings tab. Doing so enforces that each access request will be manually reviewed and approved by the dataset authors. Only users whose access requests have been approved will be able to access the dataset’s content.
You can automate the approval of access requests with the following API:
GET
/api/datasets/{repo_id}/user-access-request/pending
Retrieve the list of pending access requests for the given dataset.
headers = { "authorization" : "Bearer $token" }
GET
/api/datasets/{repo_id}/user-access-request/accepted
Retrieve the list of accepted access requests for the given dataset.
headers = { "authorization" : "Bearer $token" }
GET
/api/datasets/{repo_id}/user-access-request/rejected
Retrieve the list of rejected access requests for the given dataset.
headers = { "authorization" : "Bearer $token" }
POST
/api/datasets/{repo_id}/user-access-request/handle
Change the status of a given access request to status
.
headers = { "authorization" : "Bearer $token" }
json = { "status": "accepted" | "rejected" | "pending", "user": "username" }
POST
/api/datasets/{repo_id}/user-access-request/grant
Allow a specific user to access your repository.
headers = { "authorization" : "Bearer $token" }
json = { "user": "username" }
The base URL for the HTTP endpoints above is https://huggingface.co
. The $token
to pass as a bearer token can be generated from your user settings. It must have write
access to the gated repository.
By default, notifications for new pending access requests are sent once a day via email. When the repo lives in an organization, those emails are sent to the first 5 admins of the organization.
You can customize the way you receive those notifications from the gated dataset’s settings page. You can choose whether to receive notifications for new pending access requests in bulk once a day or in real-time. You can also set a custom email to send those notifications to.
In some cases, you might also want to modify the text in the heading of the gate as well as the text in the button. For those use cases you can modify extra_gated_heading
and extra_gated_button_content
.
Copied