Managing local and online repositories
Last updated
Last updated
The Repository
class is a helper class that wraps git
and git-lfs
commands. It provides tooling adapted for managing repositories which can be very large.
It is the recommended tool as soon as any git
operation is involved, or when collaboration will be a point of focus with the repository itself.
( local_dir: typing.Union[str, pathlib.Path]clone_from: typing.Optional[str] = Nonerepo_type: typing.Optional[str] = Nonetoken: typing.Union[bool, str] = Truegit_user: typing.Optional[str] = Nonegit_email: typing.Optional[str] = Nonerevision: typing.Optional[str] = Noneskip_lfs_files: bool = Falseclient: typing.Optional[boincai_hub.hf_api.HfApi] = None )
Helper class to wrap the git and git-lfs commands.
The aim is to facilitate interacting with boincai.com hosted model or dataset repos, though not a lot here (if any) is actually specific to boincai.com.
__init__
( local_dir: typing.Union[str, pathlib.Path]clone_from: typing.Optional[str] = Nonerepo_type: typing.Optional[str] = Nonetoken: typing.Union[bool, str] = Truegit_user: typing.Optional[str] = Nonegit_email: typing.Optional[str] = Nonerevision: typing.Optional[str] = Noneskip_lfs_files: bool = Falseclient: typing.Optional[boincai_hub.hf_api.HfApi] = None )
Parameters
local_dir (str
or Path
) β path (e.g. 'my_trained_model/'
) to the local directory, where the Repository
will be initialized.
clone_from (str
, optional) β Either a repository url or repo_id
. Example:
"https://boincai.com/philschmid/playground-tests"
"philschmid/playground-tests"
repo_type (str
, optional) β To set when cloning a repo from a repo_id. Default is model.
token (bool
or str
, optional) β A valid authentication token (see ). If None
or True
and machine is logged in (through boincai-cli login
or ), token will be retrieved from the cache. If False
, token is not sent in the request header.
git_user (str
, optional) β will override the git config user.name
for committing and pushing files to the hub.
git_email (str
, optional) β will override the git config user.email
for committing and pushing files to the hub.
revision (str
, optional) β Revision to checkout after initializing the repository. If the revision doesnβt exist, a branch will be created with that revision name from the default branchβs current HEAD.
skip_lfs_files (bool
, optional, defaults to False
) β whether to skip git-LFS files or not.
client (HfApi
, optional) β Instance of to use when calling the HF Hub API. A new instance will be created if this is left to None
.
Raises
Instantiate a local clone of a git repo.
Repository
uses the local git credentials by default. If explicitly set, the token
or the git_user
/git_email
pair will be used instead.
current_branch
( ) β str
Returns
str
Current checked out branch.
Returns the current checked out branch.
add_tag
( tag_name: strmessage: typing.Optional[str] = Noneremote: typing.Optional[str] = None )
Parameters
tag_name (str
) β The name of the tag to be added.
message (str
, optional) β The message that accompanies the tag. The tag will turn into an annotated tag if a message is passed.
remote (str
, optional) β The remote on which to add the tag.
Add a tag at the current head and push it
If remote is None, will just be updated locally
If no message is provided, the tag will be lightweight. if a message is provided, the tag will be annotated.
auto_track_binary_files
( pattern: str = '.' ) β List[str]
Parameters
pattern (str
, optional, defaults to β.β) β The pattern with which to track files that are binary.
Returns
List[str]
List of filenames that are now tracked due to being binary files
Automatically track binary files with git-lfs.
auto_track_large_files
( pattern: str = '.' ) β List[str]
Parameters
pattern (str
, optional, defaults to β.β) β The pattern with which to track files that are above 10MBs.
Returns
List[str]
List of filenames that are now tracked due to their size.
Automatically track large files (files that weigh more than 10MBs) with git-lfs.
check_git_versions
( )
Raises
Checks that git
and git-lfs
can be run.
clone_from
( repo_url: strtoken: typing.Union[bool, str, NoneType] = None )
Parameters
repo_url (str
) β The URL from which to clone the repository
token (Union[str, bool]
, optional) β Whether to use the authentication token. It can be:
a string which is the token itself
False
, which would not use the authentication token
True
, which would fetch the authentication token from the local folder and use it (you should be logged in for this to work).
None
, which would retrieve the value of self.boincai_token
.
Clone from a remote. If the folder already exists, will try to clone the repository within it.
If this folder is a git repository with linked history, will try to update the repository.
Raises the following error:
commit
( commit_message: strbranch: typing.Optional[str] = Nonetrack_large_files: bool = Trueblocking: bool = Trueauto_lfs_prune: bool = False )
Parameters
commit_message (str
) β Message to use for the commit.
branch (str
, optional) β The branch on which the commit will appear. This branch will be checked-out before any operation.
track_large_files (bool
, optional, defaults to True
) β Whether to automatically track large files or not. Will do so by default.
blocking (bool
, optional, defaults to True
) β Whether the function should return only when the git push
has finished.
auto_lfs_prune (bool
, defaults to True
) β Whether to automatically prune files once they have been pushed to the remote.
Context manager utility to handle committing to a repository. This automatically tracks large files (>10Mb) with git-lfs. Set the track_large_files
argument to False
if you wish to ignore that behavior.
Examples:
Copied
delete_tag
( tag_name: strremote: typing.Optional[str] = None ) β bool
Parameters
tag_name (str
) β The tag name to delete.
remote (str
, optional) β The remote on which to delete the tag.
Returns
bool
True
if deleted, False
if the tag didnβt exist. If remote is not passed, will just be updated locally
Delete a tag, both local and remote, if it exists
git_add
( pattern: str = '.'auto_lfs_track: bool = False )
Parameters
pattern (str
, optional, defaults to β.β) β The pattern with which to add files to staging.
auto_lfs_track (bool
, optional, defaults to False
) β Whether to automatically track large and binary files with git-lfs. Any file over 10MB in size, or in binary format, will be automatically tracked.
git add
Setting the auto_lfs_track
parameter to True
will automatically track files that are larger than 10MB with git-lfs
.
git_checkout
( revision: strcreate_branch_ok: bool = False )
Parameters
revision (str
) β The revision to checkout.
create_branch_ok (str
, optional, defaults to False
) β Whether creating a branch named with the revision
passed at the current checked-out reference if revision
isnβt an existing revision is allowed.
git checkout a given revision
Specifying create_branch_ok
to True
will create the branch to the given revision if that revision doesnβt exist.
git_commit
( commit_message: str = 'commit files to HF hub' )
Parameters
commit_message (str
, optional, defaults to βcommit files to HF hubβ) β The message attributed to the commit.
git commit
git_config_username_and_email
( git_user: typing.Optional[str] = Nonegit_email: typing.Optional[str] = None )
Parameters
git_user (str
, optional) β The username to register through git
.
git_email (str
, optional) β The email to register through git
.
Sets git username and email (only in the current repo).
git_credential_helper_store
( )
Sets the git credential helper to store
git_head_commit_url
( ) β str
Returns
str
The URL to the current checked-out commit.
Get URL to last commit on HEAD. We assume itβs been pushed, and the url scheme is the same one as for GitHub or BOINC AI.
git_head_hash
( ) β str
Returns
str
The current checked out commit SHA.
Get commit sha on top of HEAD.
git_pull
( rebase: bool = Falselfs: bool = False )
Parameters
rebase (bool
, optional, defaults to False
) β Whether to rebase the current branch on top of the upstream branch after fetching.
lfs (bool
, optional, defaults to False
) β Whether to fetch the LFS files too. This option only changes the behavior when a repository was cloned without fetching the LFS files; calling repo.git_pull(lfs=True)
will then fetch the LFS file from the remote repository.
git pull
git_push
( upstream: typing.Optional[str] = Noneblocking: bool = Trueauto_lfs_prune: bool = False )
Parameters
upstream (str
, optional) β Upstream to which this should push. If not specified, will push to the lastly defined upstream or to the default one (origin main
).
blocking (bool
, optional, defaults to True
) β Whether the function should return only when the push has finished. Setting this to False
will return an CommandInProgress
object which has an is_done
property. This property will be set to True
when the push is finished.
auto_lfs_prune (bool
, optional, defaults to False
) β Whether to automatically prune files once they have been pushed to the remote.
git push
If used without setting blocking
, will return url to commit on remote repo. If used with blocking=True
, will return a tuple containing the url to commit and the command object to follow for information about the process.
git_remote_url
( ) β str
Returns
str
The URL of the origin
remote.
Get URL to origin remote.
is_repo_clean
( ) β bool
Returns
bool
True
if the git status is clean, False
otherwise.
Return whether or not the git status is clean or not
lfs_enable_largefiles
( )
HF-specific. This enables upload support of files >5GB.
lfs_prune
( recent = False )
Parameters
git lfs prune
lfs_track
( patterns: typing.Union[str, typing.List[str]]filename: bool = False )
Parameters
patterns (Union[str, List[str]]
) β The pattern, or list of patterns, to track with git-lfs.
filename (bool
, optional, defaults to False
) β Whether to use the patterns as literal filenames.
Tell git-lfs to track files according to a pattern.
Setting the filename
argument to True
will treat the arguments as literal filenames, not as patterns. Any special glob characters in the filename will be escaped when writing to the .gitattributes
file.
lfs_untrack
( patterns: typing.Union[str, typing.List[str]] )
Parameters
patterns (Union[str, List[str]]
) β The pattern, or list of patterns, to untrack with git-lfs.
Tell git-lfs to untrack those files.
list_deleted_files
( ) β List[str]
Returns
List[str]
A list of files that have been deleted in the working directory or index.
Returns a list of the files that are deleted in the working directory or index.
push_to_hub
( commit_message: str = 'commit files to HF hub'blocking: bool = Trueclean_ok: bool = Trueauto_lfs_prune: bool = False )
Parameters
commit_message (str
) β Message to use for the commit.
blocking (bool
, optional, defaults to True
) β Whether the function should return only when the git push
has finished.
clean_ok (bool
, optional, defaults to True
) β If True, this function will return None if the repo is untouched. Default behavior is to fail because the git command fails.
auto_lfs_prune (bool
, optional, defaults to False
) β Whether to automatically prune files once they have been pushed to the remote.
Helper to add, commit, and push files to remote repository on the BOINC AI Hub. Will automatically track large files (>10MB).
tag_exists
( tag_name: strremote: typing.Optional[str] = None ) β bool
Parameters
tag_name (str
) β The name of the tag to check.
remote (str
, optional) β Whether to check if the tag exists on a remote. This parameter should be the identifier of the remote.
Returns
bool
Whether the tag exists.
Check if a tag exists or not.
wait_for_commands
( )
Blocking method: blocks all subsequent execution until all commands have been processed.
boincai_hub.repository.is_git_repo
( folder: typing.Union[str, pathlib.Path] ) β bool
Parameters
folder (str
) β The folder in which to run the command.
Returns
bool
True
if the repository is part of a repository, False
otherwise.
Check if the folder is the root or part of a git repository
boincai_hub.repository.is_local_clone
( folder: typing.Union[str, pathlib.Path]remote_url: str ) β bool
Parameters
folder (str
or Path
) β The folder in which to run the command.
remote_url (str
) β The url of a git repository.
Returns
bool
True
if the repository is a local clone of the remote repository specified, False
otherwise.
Check if the folder is a local clone of the remote_url
boincai_hub.repository.is_tracked_with_lfs
( filename: typing.Union[str, pathlib.Path] ) β bool
Parameters
filename (str
or Path
) β The filename to check.
Returns
bool
True
if the file passed is tracked with git-lfs, False
otherwise.
Check if the file passed is tracked with git-lfs.
boincai_hub.repository.is_git_ignored
( filename: typing.Union[str, pathlib.Path] ) β bool
Parameters
filename (str
or Path
) β The filename to check.
Returns
bool
True
if the file passed is ignored by git
, False
otherwise.
Check if file is git-ignored. Supports nested .gitignore files.
boincai_hub.repository.files_to_be_staged
( pattern: str = '.'folder: typing.Union[str, pathlib.Path, NoneType] = None ) β List[str]
Parameters
pattern (str
or Path
) β The pattern of filenames to check. Put .
to get all files.
folder (str
or Path
) β The folder in which to run the command.
Returns
List[str]
List of files that are to be staged.
Returns a list of filenames that are to be staged.
boincai_hub.repository.is_tracked_upstream
( folder: typing.Union[str, pathlib.Path] ) β bool
Parameters
folder (str
or Path
) β The folder in which to run the command.
Returns
bool
True
if the current checked-out branch is tracked upstream, False
otherwise.
Check if the current checked-out branch is tracked upstream.
boincai_hub.repository.commits_to_push
( folder: typing.Union[str, pathlib.Path]upstream: typing.Optional[str] = None ) β int
Parameters
folder (str
or Path
) β The folder in which to run the command.
upstream (str
, optional) β
Returns
int
Number of commits that would be pushed upstream were a git push
to proceed.
Check the number of commits that would be pushed upstream
The name of the upstream repository with which the comparison should be made.
The Repository
utility offers several methods which can be launched asynchronously:
git_push
git_pull
push_to_hub
The commit
context manager
See below for utilities to manage such asynchronous methods.
( local_dir: typing.Union[str, pathlib.Path]clone_from: typing.Optional[str] = Nonerepo_type: typing.Optional[str] = Nonetoken: typing.Union[bool, str] = Truegit_user: typing.Optional[str] = Nonegit_email: typing.Optional[str] = Nonerevision: typing.Optional[str] = Noneskip_lfs_files: bool = Falseclient: typing.Optional[boincai_hub.hf_api.HfApi] = None )
Helper class to wrap the git and git-lfs commands.
The aim is to facilitate interacting with boincai.com hosted model or dataset repos, though not a lot here (if any) is actually specific to boincai.com.
commands_failed
( )
Returns the asynchronous commands that failed.
commands_in_progress
( )
Returns the asynchronous commands that are currently in progress.
wait_for_commands
( )
Blocking method: blocks all subsequent execution until all commands have been processed.
( title: stris_done_method: typing.Callablestatus_method: typing.Callableprocess: Popenpost_method: typing.Optional[typing.Callable] = None )
Utility to follow commands launched asynchronously.
β if the remote repository set in clone_from
does not exist.
If clone_from
is set, the repo will be cloned from an existing remote repository. If the remote repo does not exist, a EnvironmentError
exception will be thrown. Please create the remote repo first using .
β if git
or git-lfs
are not installed.
if an organization token (starts with βapi_orgβ) is passed. Use must use your own personal access token (see ).
if you are trying to clone the repository in a non-empty folder, or if the git
operations raise errors.
recent (bool
, optional, defaults to False
) β Whether to prune files even if they were referenced by recent commits. See the following for more information.