Search
Search the Hub
In this tutorial, you will learn how to search models, datasets and spaces on the Hub using boincai_hub.
How to list repositories ?
boincai_hub library includes an HTTP client HfApi to interact with the Hub. Among other things, it can list models, datasets and spaces stored on the Hub:
Copied
>>> from boincai_hub import HfApi
>>> api = HfApi()
>>> models = api.list_models()The output of list_models() is an iterator over the models stored on the Hub.
Similarly, you can use list_datasets() to list datasets and list_spaces() to list Spaces.
How to filter repositories ?
Listing repositories is great but now you might want to filter your search. The list helpers have several attributes like:
filterauthorsearchβ¦
Two of these parameters are intuitive (author and search), but what about that filter? filter takes as input a ModelFilter object (or DatasetFilter). You can instantiate it by specifying which models you want to filter.
Letβs see an example to get all models on the Hub that does image classification, have been trained on the imagenet dataset and that runs with PyTorch. That can be done with a single ModelFilter. Attributes are combined as βlogical ANDβ.
Copied
While filtering, you can also sort the models and take only the top results. For example, the following example fetches the top 5 most downloaded datasets on the Hub:
Copied
How to explore filter options ?
Now you know how to filter your list of models/datasets/spaces. The problem you might have is that you donβt know exactly what you are looking for. No worries! We also provide some helpers that allows you to discover what arguments can be passed in your query.
ModelSearchArguments and DatasetSearchArguments are nested namespace objects that have every single option available on the Hub and that will return what should be passed to filter. The best of all is: it has tab completion π .
Copied
Before continuing, please we aware that ModelSearchArguments and DatasetSearchArguments are legacy helpers meant for exploratory purposes only. Their initialization require listing all models and datasets on the Hub which makes them increasingly slower as the number of repos on the Hub increases. For some production-ready code, consider passing raw strings when making a filtered search on the Hub.
Now, letβs check what is available in model_args by checking itβs output, you will find:
Copied
It has a variety of attributes or keys available to you. This is because it is both an object and a dictionary, so you can either do model_args["author"] or model_args.author.
The first criteria is getting all PyTorch models. This would be found under the library attribute, so letβs see if it is there:
Copied
It is! The PyTorch name is there, so youβll need to use model_args.library.PyTorch:
Copied
Below is an animation repeating the process for finding both the Text Classification and glue requirements:


Now that all the pieces are there, the last step is to combine them all for something the API can use through the ModelFilter and DatasetFilter classes (i.e. strings).
Copied
As you can see, it found the models that fit all the criteria. You can even take it further by passing in an array for each of the parameters from before. For example, letβs take a look for the same configuration, but also include TensorFlow in the filter:
Copied
This query is strictly equivalent to:
Copied
Here, the ModelSearchArguments has been a helper to explore the options available on the Hub. However, it is not a requirement to make a search. Another way to do that is to visit the models and datasets pages in your browser, search for some parameters and look at the values in the URL.
Last updated