Feature Extractor
Feature Extractor
A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images e.g. cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow tensors.
FeatureExtractionMixin
class transformers.FeatureExtractionMixin
( **kwargs )
This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.
from_pretrained
( pretrained_model_name_or_path: typing.Union[str, os.PathLike]cache_dir: typing.Union[str, os.PathLike, NoneType] = Noneforce_download: bool = Falselocal_files_only: bool = Falsetoken: typing.Union[bool, str, NoneType] = Nonerevision: str = 'main'**kwargs )
Parameters
pretrained_model_name_or_path (
str
oros.PathLike
) — This can be either:a string, the model id of a pretrained feature_extractor hosted inside a model repo on boincai.com. Valid model ids can be located at the root-level, like
bert-base-uncased
, or namespaced under a user or organization name, likedbmdz/bert-base-german-cased
.a path to a directory containing a feature extractor file saved using the save_pretrained() method, e.g.,
./my_model_directory/
.a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.resume_download (
bool
, optional, defaults toFalse
) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request.token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runningboincai-cli login
(stored in~/.boincai
).revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on boincai.co, sorevision
can be any identifier allowed by git.
Instantiate a type of FeatureExtractionMixin from a feature extractor, e.g. a derived class of SequenceFeatureExtractor.
Examples:
Copied
save_pretrained
( save_directory: typing.Union[str, os.PathLike]push_to_hub: bool = False**kwargs )
Parameters
save_directory (
str
oros.PathLike
) — Directory where the feature extractor JSON file will be saved (will be created if it does not exist).push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the BOINC AI model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace).kwargs (
Dict[str, Any]
, optional) — Additional key word arguments passed along to the push_to_hub() method.
Save a feature_extractor object to the directory save_directory
, so that it can be re-loaded using the from_pretrained() class method.
SequenceFeatureExtractor
class transformers.SequenceFeatureExtractor
( feature_size: intsampling_rate: intpadding_value: float**kwargs )
Parameters
feature_size (
int
) — The feature dimension of the extracted features.sampling_rate (
int
) — The sampling rate at which the audio files should be digitalized expressed in hertz (Hz).padding_value (
float
) — The value that is used to fill the padding values / vectors.
This is a general feature extraction class for speech recognition.
pad
( processed_features: typing.Union[transformers.feature_extraction_utils.BatchFeature, typing.List[transformers.feature_extraction_utils.BatchFeature], typing.Dict[str, transformers.feature_extraction_utils.BatchFeature], typing.Dict[str, typing.List[transformers.feature_extraction_utils.BatchFeature]], typing.List[typing.Dict[str, transformers.feature_extraction_utils.BatchFeature]]]padding: typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = Truemax_length: typing.Optional[int] = Nonetruncation: bool = Falsepad_to_multiple_of: typing.Optional[int] = Nonereturn_attention_mask: typing.Optional[bool] = Nonereturn_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )
Parameters
processed_features (BatchFeature, list of BatchFeature,
Dict[str, List[float]]
,Dict[str, List[List[float]]
orList[Dict[str, List[float]]]
) — Processed inputs. Can represent one input (BatchFeature orDict[str, List[float]]
) or a batch of input values / vectors (list of BatchFeature, Dict[str, List[List[float]]] or List[Dict[str, List[float]]]) so you can use this method during preprocessing as well as in a PyTorch Dataloader collate function.Instead of
List[float]
you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors), see the note above for the return type.padding (
bool
,str
or PaddingStrategy, optional, defaults toTrue
) — Select a strategy to pad the returned sequences (according to the model’s padding side and padding index) among:True
or'longest'
: Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).'max_length'
: Pad to a maximum length specified with the argumentmax_length
or to the maximum acceptable input length for the model if that argument is not provided.False
or'do_not_pad'
(default): No padding (i.e., can output a batch with sequences of different lengths).
max_length (
int
, optional) — Maximum length of the returned list and optionally padding length (see above).truncation (
bool
) — Activates truncation to cut input sequences longer thanmax_length
tomax_length
.pad_to_multiple_of (
int
, optional) — If set will pad the sequence to a multiple of the provided value.This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
>= 7.5
(Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.return_attention_mask (
bool
, optional) — Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor’s default.return_tensors (
str
or TensorType, optional) — If set, will return tensors instead of list of python integers. Acceptable values are:'tf'
: Return TensorFlowtf.constant
objects.'pt'
: Return PyTorchtorch.Tensor
objects.'np'
: Return Numpynp.ndarray
objects.
Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with self.padding_side
, self.padding_value
)
If the processed_features
passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the result will use the same type unless you provide a different tensor type with return_tensors
. In the case of PyTorch tensors, you will lose the specific device of your tensors however.
BatchFeature
class transformers.BatchFeature
( data: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonetensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )
Parameters
data (
dict
) — Dictionary of lists/arrays/tensors returned by the call/pad methods (‘input_values’, ‘attention_mask’, etc.).tensor_type (
Union[None, str, TensorType]
, optional) — You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.
Holds the output of the pad() and feature extractor specific __call__
methods.
This class is derived from a python dictionary and can be used as a dictionary.
convert_to_tensors
( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )
Parameters
tensor_type (
str
or TensorType, optional) — The type of tensors to use. Ifstr
, should be one of the values of the enum TensorType. IfNone
, no modification is done.
Convert the inner content to tensors.
to
( *args**kwargs ) → BatchFeature
Parameters
args (
Tuple
) — Will be passed to theto(...)
function of the tensors.kwargs (
Dict
, optional) — Will be passed to theto(...)
function of the tensors.
Returns
The same instance after modification.
Send all values to device by calling v.to(*args, **kwargs)
(PyTorch only). This should support casting in different dtypes
and sending the BatchFeature
to a different device
.
ImageFeatureExtractionMixin
class transformers.ImageFeatureExtractionMixin
( )
Mixin that contain utilities for preparing image features.
center_crop
( imagesize ) → new_image
Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
of shape (n_channels, height, width) or (height, width, n_channels)) — The image to resize.size (
int
orTuple[int, int]
) — The size to which crop the image.
Returns
new_image
A center cropped PIL.Image.Image
or np.ndarray
or torch.Tensor
of shape: (n_channels, height, width).
Crops image
to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).
convert_rgb
( image )
Parameters
image (
PIL.Image.Image
) — The image to convert.
Converts PIL.Image.Image
to RGB format.
expand_dims
( image )
Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to expand.
Expands 2-dimensional image
to 3 dimensions.
flip_channel_order
( image )
Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image whose color channels to flip. Ifnp.ndarray
ortorch.Tensor
, the channel dimension should be first.
Flips the channel order of image
from RGB to BGR, or vice versa. Note that this will trigger a conversion of image
to a NumPy array if it’s a PIL Image.
normalize
( imagemeanstdrescale = False )
Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to normalize.mean (
List[float]
ornp.ndarray
ortorch.Tensor
) — The mean (per channel) to use for normalization.std (
List[float]
ornp.ndarray
ortorch.Tensor
) — The standard deviation (per channel) to use for normalization.rescale (
bool
, optional, defaults toFalse
) — Whether or not to rescale the image to be between 0 and 1. If a PIL image is provided, scaling will happen automatically.
Normalizes image
with mean
and std
. Note that this will trigger a conversion of image
to a NumPy array if it’s a PIL Image.
rescale
( image: ndarrayscale: typing.Union[float, int] )
Rescale a numpy image by scale amount
resize
( imagesizeresample = Nonedefault_to_square = Truemax_size = None ) → image
Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to resize.size (
int
orTuple[int, int]
) — The size to use for resizing the image. Ifsize
is a sequence like (h, w), output size will be matched to this.If
size
is an int anddefault_to_square
isTrue
, then image will be resized to (size, size). Ifsize
is an int anddefault_to_square
isFalse
, then smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).resample (
int
, optional, defaults toPILImageResampling.BILINEAR
) — The filter to user for resampling.default_to_square (
bool
, optional, defaults toTrue
) — How to convertsize
when it is a single int. If set toTrue
, thesize
will be converted to a square (size
,size
). If set toFalse
, will replicatetorchvision.transforms.Resize
with support for resizing only the smallest edge and providing an optionalmax_size
.max_size (
int
, optional, defaults toNone
) — The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater thanmax_size
after being resized according tosize
, then the image is resized again so that the longer edge is equal tomax_size
. As a result,size
might be overruled, i.e the smaller edge may be shorter thansize
. Only used ifdefault_to_square
isFalse
.
Returns
image
A resized PIL.Image.Image
.
Resizes image
. Enforces conversion of input to PIL.Image.
rotate
( imageangleresample = Noneexpand = 0center = Nonetranslate = Nonefillcolor = None ) → image
Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to rotate. Ifnp.ndarray
ortorch.Tensor
, will be converted toPIL.Image.Image
before rotating.
Returns
image
A rotated PIL.Image.Image
.
Returns a rotated copy of image
. This method returns a copy of image
, rotated the given number of degrees counter clockwise around its centre.
to_numpy_array
( imagerescale = Nonechannel_first = True )
Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to convert to a NumPy array.rescale (
bool
, optional) — Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default toTrue
if the image is a PIL Image or an array/tensor of integers,False
otherwise.channel_first (
bool
, optional, defaults toTrue
) — Whether or not to permute the dimensions of the image to put the channel dimension first.
Converts image
to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.
to_pil_image
( imagerescale = None )
Parameters
image (
PIL.Image.Image
ornumpy.ndarray
ortorch.Tensor
) — The image to convert to the PIL Image format.rescale (
bool
, optional) — Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default toTrue
if the image type is a floating type,False
otherwise.
Converts image
to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.
Last updated