Feature Extractor
Last updated
Last updated
A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images e.g. cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow tensors.
( **kwargs )
This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.
from_pretrained
( pretrained_model_name_or_path: typing.Union[str, os.PathLike]cache_dir: typing.Union[str, os.PathLike, NoneType] = Noneforce_download: bool = Falselocal_files_only: bool = Falsetoken: typing.Union[bool, str, NoneType] = Nonerevision: str = 'main'**kwargs )
Parameters
pretrained_model_name_or_path (str
or os.PathLike
) β This can be either:
a string, the model id of a pretrained feature_extractor hosted inside a model repo on boincai.com. Valid model ids can be located at the root-level, like bert-base-uncased
, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased
.
a path to a directory containing a feature extractor file saved using the method, e.g., ./my_model_directory/
.
a path or url to a saved feature extractor JSON file, e.g., ./my_model_directory/preprocessor_config.json
.
cache_dir (str
or os.PathLike
, optional) β Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.
force_download (bool
, optional, defaults to False
) β Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.
resume_download (bool
, optional, defaults to False
) β Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.
proxies (Dict[str, str]
, optional) β A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request.
token (str
or bool
, optional) β The token to use as HTTP bearer authorization for remote files. If True
, or not specified, will use the token generated when running boincai-cli login
(stored in ~/.boincai
).
revision (str
, optional, defaults to "main"
) β The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on boincai.co, so revision
can be any identifier allowed by git.
Examples:
Copied
save_pretrained
( save_directory: typing.Union[str, os.PathLike]push_to_hub: bool = False**kwargs )
Parameters
save_directory (str
or os.PathLike
) β Directory where the feature extractor JSON file will be saved (will be created if it does not exist).
push_to_hub (bool
, optional, defaults to False
) β Whether or not to push your model to the BOINC AI model hub after saving it. You can specify the repository you want to push to with repo_id
(will default to the name of save_directory
in your namespace).
( feature_size: intsampling_rate: intpadding_value: float**kwargs )
Parameters
feature_size (int
) β The feature dimension of the extracted features.
sampling_rate (int
) β The sampling rate at which the audio files should be digitalized expressed in hertz (Hz).
padding_value (float
) β The value that is used to fill the padding values / vectors.
This is a general feature extraction class for speech recognition.
pad
( processed_features: typing.Union[transformers.feature_extraction_utils.BatchFeature, typing.List[transformers.feature_extraction_utils.BatchFeature], typing.Dict[str, transformers.feature_extraction_utils.BatchFeature], typing.Dict[str, typing.List[transformers.feature_extraction_utils.BatchFeature]], typing.List[typing.Dict[str, transformers.feature_extraction_utils.BatchFeature]]]padding: typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = Truemax_length: typing.Optional[int] = Nonetruncation: bool = Falsepad_to_multiple_of: typing.Optional[int] = Nonereturn_attention_mask: typing.Optional[bool] = Nonereturn_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )
Parameters
Instead of List[float]
you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors), see the note above for the return type.
True
or 'longest'
: Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).
'max_length'
: Pad to a maximum length specified with the argument max_length
or to the maximum acceptable input length for the model if that argument is not provided.
False
or 'do_not_pad'
(default): No padding (i.e., can output a batch with sequences of different lengths).
max_length (int
, optional) β Maximum length of the returned list and optionally padding length (see above).
truncation (bool
) β Activates truncation to cut input sequences longer than max_length
to max_length
.
pad_to_multiple_of (int
, optional) β If set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5
(Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
return_attention_mask (bool
, optional) β Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractorβs default.
'tf'
: Return TensorFlow tf.constant
objects.
'pt'
: Return PyTorch torch.Tensor
objects.
'np'
: Return Numpy np.ndarray
objects.
Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with self.padding_side
, self.padding_value
)
If the processed_features
passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the result will use the same type unless you provide a different tensor type with return_tensors
. In the case of PyTorch tensors, you will lose the specific device of your tensors however.
( data: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonetensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )
Parameters
data (dict
) β Dictionary of lists/arrays/tensors returned by the call/pad methods (βinput_valuesβ, βattention_maskβ, etc.).
tensor_type (Union[None, str, TensorType]
, optional) β You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.
This class is derived from a python dictionary and can be used as a dictionary.
convert_to_tensors
( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )
Parameters
Convert the inner content to tensors.
to
Parameters
args (Tuple
) β Will be passed to the to(...)
function of the tensors.
kwargs (Dict
, optional) β Will be passed to the to(...)
function of the tensors.
Returns
The same instance after modification.
Send all values to device by calling v.to(*args, **kwargs)
(PyTorch only). This should support casting in different dtypes
and sending the BatchFeature
to a different device
.
( )
Mixin that contain utilities for preparing image features.
center_crop
( imagesize ) β new_image
Parameters
image (PIL.Image.Image
or np.ndarray
or torch.Tensor
of shape (n_channels, height, width) or (height, width, n_channels)) β The image to resize.
size (int
or Tuple[int, int]
) β The size to which crop the image.
Returns
new_image
A center cropped PIL.Image.Image
or np.ndarray
or torch.Tensor
of shape: (n_channels, height, width).
Crops image
to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).
convert_rgb
( image )
Parameters
image (PIL.Image.Image
) β The image to convert.
Converts PIL.Image.Image
to RGB format.
expand_dims
( image )
Parameters
image (PIL.Image.Image
or np.ndarray
or torch.Tensor
) β The image to expand.
Expands 2-dimensional image
to 3 dimensions.
flip_channel_order
( image )
Parameters
image (PIL.Image.Image
or np.ndarray
or torch.Tensor
) β The image whose color channels to flip. If np.ndarray
or torch.Tensor
, the channel dimension should be first.
Flips the channel order of image
from RGB to BGR, or vice versa. Note that this will trigger a conversion of image
to a NumPy array if itβs a PIL Image.
normalize
( imagemeanstdrescale = False )
Parameters
image (PIL.Image.Image
or np.ndarray
or torch.Tensor
) β The image to normalize.
mean (List[float]
or np.ndarray
or torch.Tensor
) β The mean (per channel) to use for normalization.
std (List[float]
or np.ndarray
or torch.Tensor
) β The standard deviation (per channel) to use for normalization.
rescale (bool
, optional, defaults to False
) β Whether or not to rescale the image to be between 0 and 1. If a PIL image is provided, scaling will happen automatically.
Normalizes image
with mean
and std
. Note that this will trigger a conversion of image
to a NumPy array if itβs a PIL Image.
rescale
( image: ndarrayscale: typing.Union[float, int] )
Rescale a numpy image by scale amount
resize
( imagesizeresample = Nonedefault_to_square = Truemax_size = None ) β image
Parameters
image (PIL.Image.Image
or np.ndarray
or torch.Tensor
) β The image to resize.
size (int
or Tuple[int, int]
) β The size to use for resizing the image. If size
is a sequence like (h, w), output size will be matched to this.
If size
is an int and default_to_square
is True
, then image will be resized to (size, size). If size
is an int and default_to_square
is False
, then smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).
resample (int
, optional, defaults to PILImageResampling.BILINEAR
) β The filter to user for resampling.
max_size (int
, optional, defaults to None
) β The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater than max_size
after being resized according to size
, then the image is resized again so that the longer edge is equal to max_size
. As a result, size
might be overruled, i.e the smaller edge may be shorter than size
. Only used if default_to_square
is False
.
Returns
image
A resized PIL.Image.Image
.
Resizes image
. Enforces conversion of input to PIL.Image.
rotate
( imageangleresample = Noneexpand = 0center = Nonetranslate = Nonefillcolor = None ) β image
Parameters
image (PIL.Image.Image
or np.ndarray
or torch.Tensor
) β The image to rotate. If np.ndarray
or torch.Tensor
, will be converted to PIL.Image.Image
before rotating.
Returns
image
A rotated PIL.Image.Image
.
Returns a rotated copy of image
. This method returns a copy of image
, rotated the given number of degrees counter clockwise around its centre.
to_numpy_array
( imagerescale = Nonechannel_first = True )
Parameters
image (PIL.Image.Image
or np.ndarray
or torch.Tensor
) β The image to convert to a NumPy array.
rescale (bool
, optional) β Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default to True
if the image is a PIL Image or an array/tensor of integers, False
otherwise.
channel_first (bool
, optional, defaults to True
) β Whether or not to permute the dimensions of the image to put the channel dimension first.
Converts image
to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.
to_pil_image
( imagerescale = None )
Parameters
image (PIL.Image.Image
or numpy.ndarray
or torch.Tensor
) β The image to convert to the PIL Image format.
rescale (bool
, optional) β Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default to True
if the image type is a floating type, False
otherwise.
Converts image
to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.
Instantiate a type of from a feature extractor, e.g. a derived class of .
kwargs (Dict[str, Any]
, optional) β Additional key word arguments passed along to the method.
Save a feature_extractor object to the directory save_directory
, so that it can be re-loaded using the class method.
processed_features (, list of , Dict[str, List[float]]
, Dict[str, List[List[float]]
or List[Dict[str, List[float]]]
) β Processed inputs. Can represent one input ( or Dict[str, List[float]]
) or a batch of input values / vectors (list of , Dict[str, List[List[float]]] or List[Dict[str, List[float]]]) so you can use this method during preprocessing as well as in a PyTorch Dataloader collate function.
padding (bool
, str
or , optional, defaults to True
) β Select a strategy to pad the returned sequences (according to the modelβs padding side and padding index) among:
return_tensors (str
or , optional) β If set, will return tensors instead of list of python integers. Acceptable values are:
Holds the output of the and feature extractor specific __call__
methods.
tensor_type (str
or , optional) β The type of tensors to use. If str
, should be one of the values of the enum . If None
, no modification is done.
( *args**kwargs ) β
default_to_square (bool
, optional, defaults to True
) β How to convert size
when it is a single int. If set to True
, the size
will be converted to a square (size
,size
). If set to False
, will replicate with support for resizing only the smallest edge and providing an optional max_size
.