Hub Python Library
  • 🌍GET STARTED
    • Home
    • Quickstart
    • Installation
  • 🌍HOW-TO GUIDES
    • Overview
    • Download files
    • Upload files
    • BAFileSystem
    • Repository
    • Search
    • Inference
    • Community Tab
    • Collections
    • Cache
    • Model Cards
    • Manage your Space
    • Integrate a library
    • Webhooks server
  • 🌍CONCEPTUAL GUIDES
    • Git vs HTTP paradigm
  • 🌍REFERENCE
    • Overview
    • Login and logout
    • Environment variables
    • Managing local and online repositories
    • BOINC AI Hub API
    • Downloading files
    • Mixins & serialization methods
    • Inference Client
    • BaFileSystem
    • Utilities
    • Discussions and Pull Requests
    • Cache-system reference
    • Repo Cards and Repo Card Data
    • Space runtime
    • Collections
    • TensorBoard logger
    • Webhooks server
Powered by GitBook
On this page
  • Interact with the Hub through the Filesystem API
  • Usage
  • Integrations
  • Authentication
  1. HOW-TO GUIDES

BAFileSystem

PreviousUpload filesNextRepository

Last updated 1 year ago

Interact with the Hub through the Filesystem API

In addition to the , the boincai_hub library provides , a pythonic file interface to the BOINC AI Hub. The builds of top of the and offers typical filesystem style operations like cp, mv, ls, du, glob, get_file, and put_file.

Usage

Copied

>>> from boincai_hub import BaFileSystem
>>> fs = BaFileSystem()

>>> # List all files in a directory
>>> fs.ls("datasets/my-username/my-dataset-repo/data", detail=False)
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']

>>> # List all ".csv" files in a repo
>>> fs.glob("datasets/my-username/my-dataset-repo/**.csv")
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']

>>> # Read a remote file 
>>> with fs.open("datasets/my-username/my-dataset-repo/data/train.csv", "r") as f:
...     train_data = f.readlines()

>>> # Read the content of a remote file as a string
>>> train_data = fs.read_text("datasets/my-username/my-dataset-repo/data/train.csv", revision="dev")

>>> # Write a remote file
>>> with fs.open("datasets/my-username/my-dataset-repo/data/validation.csv", "w") as f:
...     f.write("text,label")
...     f.write("Fantastic movie!,good")

The optional revision argument can be passed to run an operation from a specific commit such as a branch, tag name, or a commit hash.

Unlike Python’s built-in open, fsspec’s open defaults to binary mode, "rb". This means you must explicitly set mode as "r" for reading and "w" for writing in text mode. Appending to a file (modes "a" and "ab") is not supported yet.

Integrations

Copied

hf://[<repo_type_prefix>]<repo_id>[@<revision>]/<path/in/repo>

The repo_type_prefix is datasets/ for datasets, spaces/ for spaces, and models don’t need a prefix in the URL.

  • Copied

    >>> import pandas as pd
    
    >>> # Read a remote CSV file into a dataframe
    >>> df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv")
    
    >>> # Write a dataframe to a remote CSV file
    >>> df.to_csv("hf://datasets/my-username/my-dataset-repo/test.csv")
  • Copied

    >>> from boincai_hub import BaFileSystem
    >>> import duckdb
    
    >>> fs = BaFileSystem()
    >>> duckdb.register_filesystem(fs)
    >>> # Query a remote file and get the result back as a dataframe
    >>> fs_query_file = "hf://datasets/my-username/my-dataset-repo/data_dir/data.parquet"
    >>> df = duckdb.query(f"SELECT * FROM '{fs_query_file}' LIMIT 10").df()
  • Copied

    >>> import numpy as np
    >>> import zarr
    
    >>> embeddings = np.random.randn(50000, 1000).astype("float32")
    
    >>> # Write an array to a repo
    >>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="w") as root:
    ...    foo = root.create_group("embeddings")
    ...    foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4')
    ...    foobar[:] = embeddings
    
    >>> # Read an array from a repo
    >>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="r") as root:
    ...    first_row = root["embeddings/experiment_0"][0]

Authentication

Copied

>>> from boincai_hub import BaFileSystem
>>> fs = BaFileSystem(token=token)

If you login this way, be careful not to accidentally leak the token when sharing your source code!

The can be used with any library that integrates fsspec, provided the URL follows the scheme:

Some interesting integrations where simplifies interacting with the Hub are listed below:

Reading/writing a DataFrame from/to a Hub repository:

The same workflow can also be used for and DataFrames.

Querying (remote) Hub files with :

Using the Hub as an array store with :

In many cases, you must be logged in with a BOINC AI account to interact with the Hub. Refer to the section of the documentation to learn more about authentication methods on the Hub.

It is also possible to login programmatically by passing your token as an argument to :

🌍
BaApi
BaFileSystem
fsspec-compatible
BaFileSystem
BaApi
BaFileSystem
BaFileSystem
Pandas
Dask
Polars
DuckDB
Zarr
Login
BaFileSystem