Transformers
  • 🌍GET STARTED
    • Transformers
    • Quick tour
    • Installation
  • 🌍TUTORIALS
    • Run inference with pipelines
    • Write portable code with AutoClass
    • Preprocess data
    • Fine-tune a pretrained model
    • Train with a script
    • Set up distributed training with BOINC AI Accelerate
    • Load and train adapters with BOINC AI PEFT
    • Share your model
    • Agents
    • Generation with LLMs
  • 🌍TASK GUIDES
    • 🌍NATURAL LANGUAGE PROCESSING
      • Text classification
      • Token classification
      • Question answering
      • Causal language modeling
      • Masked language modeling
      • Translation
      • Summarization
      • Multiple choice
    • 🌍AUDIO
      • Audio classification
      • Automatic speech recognition
    • 🌍COMPUTER VISION
      • Image classification
      • Semantic segmentation
      • Video classification
      • Object detection
      • Zero-shot object detection
      • Zero-shot image classification
      • Depth estimation
    • 🌍MULTIMODAL
      • Image captioning
      • Document Question Answering
      • Visual Question Answering
      • Text to speech
    • 🌍GENERATION
      • Customize the generation strategy
    • 🌍PROMPTING
      • Image tasks with IDEFICS
  • 🌍DEVELOPER GUIDES
    • Use fast tokenizers from BOINC AI Tokenizers
    • Run inference with multilingual models
    • Use model-specific APIs
    • Share a custom model
    • Templates for chat models
    • Run training on Amazon SageMaker
    • Export to ONNX
    • Export to TFLite
    • Export to TorchScript
    • Benchmarks
    • Notebooks with examples
    • Community resources
    • Custom Tools and Prompts
    • Troubleshoot
  • 🌍PERFORMANCE AND SCALABILITY
    • Overview
    • 🌍EFFICIENT TRAINING TECHNIQUES
      • Methods and tools for efficient training on a single GPU
      • Multiple GPUs and parallelism
      • Efficient training on CPU
      • Distributed CPU training
      • Training on TPUs
      • Training on TPU with TensorFlow
      • Training on Specialized Hardware
      • Custom hardware for training
      • Hyperparameter Search using Trainer API
    • 🌍OPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • 🌍CONTRIBUTE
    • How to contribute to transformers?
    • How to add a model to BOINC AI Transformers?
    • How to convert a BOINC AI Transformers model to TensorFlow?
    • How to add a pipeline to BOINC AI Transformers?
    • Testing
    • Checks on a Pull Request
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Glossary
    • What BOINC AI Transformers can do
    • How BOINC AI Transformers solve tasks
    • The Transformer model family
    • Summary of the tokenizers
    • Attention mechanisms
    • Padding and truncation
    • BERTology
    • Perplexity of fixed-length models
    • Pipelines for webserver inference
    • Model training anatomy
  • 🌍API
    • 🌍MAIN CLASSES
      • Agents and Tools
      • 🌍Auto Classes
        • Extending the Auto Classes
        • AutoConfig
        • AutoTokenizer
        • AutoFeatureExtractor
        • AutoImageProcessor
        • AutoProcessor
        • Generic model classes
          • AutoModel
          • TFAutoModel
          • FlaxAutoModel
        • Generic pretraining classes
          • AutoModelForPreTraining
          • TFAutoModelForPreTraining
          • FlaxAutoModelForPreTraining
        • Natural Language Processing
          • AutoModelForCausalLM
          • TFAutoModelForCausalLM
          • FlaxAutoModelForCausalLM
          • AutoModelForMaskedLM
          • TFAutoModelForMaskedLM
          • FlaxAutoModelForMaskedLM
          • AutoModelForMaskGenerationge
          • TFAutoModelForMaskGeneration
          • AutoModelForSeq2SeqLM
          • TFAutoModelForSeq2SeqLM
          • FlaxAutoModelForSeq2SeqLM
          • AutoModelForSequenceClassification
          • TFAutoModelForSequenceClassification
          • FlaxAutoModelForSequenceClassification
          • AutoModelForMultipleChoice
          • TFAutoModelForMultipleChoice
          • FlaxAutoModelForMultipleChoice
          • AutoModelForNextSentencePrediction
          • TFAutoModelForNextSentencePrediction
          • FlaxAutoModelForNextSentencePrediction
          • AutoModelForTokenClassification
          • TFAutoModelForTokenClassification
          • FlaxAutoModelForTokenClassification
          • AutoModelForQuestionAnswering
          • TFAutoModelForQuestionAnswering
          • FlaxAutoModelForQuestionAnswering
          • AutoModelForTextEncoding
          • TFAutoModelForTextEncoding
        • Computer vision
          • AutoModelForDepthEstimation
          • AutoModelForImageClassification
          • TFAutoModelForImageClassification
          • FlaxAutoModelForImageClassification
          • AutoModelForVideoClassification
          • AutoModelForMaskedImageModeling
          • TFAutoModelForMaskedImageModeling
          • AutoModelForObjectDetection
          • AutoModelForImageSegmentation
          • AutoModelForImageToImage
          • AutoModelForSemanticSegmentation
          • TFAutoModelForSemanticSegmentation
          • AutoModelForInstanceSegmentation
          • AutoModelForUniversalSegmentation
          • AutoModelForZeroShotImageClassification
          • TFAutoModelForZeroShotImageClassification
          • AutoModelForZeroShotObjectDetection
        • Audio
          • AutoModelForAudioClassification
          • AutoModelForAudioFrameClassification
          • TFAutoModelForAudioFrameClassification
          • AutoModelForCTC
          • AutoModelForSpeechSeq2Seq
          • TFAutoModelForSpeechSeq2Seq
          • FlaxAutoModelForSpeechSeq2Seq
          • AutoModelForAudioXVector
          • AutoModelForTextToSpectrogram
          • AutoModelForTextToWaveform
        • Multimodal
          • AutoModelForTableQuestionAnswering
          • TFAutoModelForTableQuestionAnswering
          • AutoModelForDocumentQuestionAnswering
          • TFAutoModelForDocumentQuestionAnswering
          • AutoModelForVisualQuestionAnswering
          • AutoModelForVision2Seq
          • TFAutoModelForVision2Seq
          • FlaxAutoModelForVision2Seq
      • Callbacks
      • Configuration
      • Data Collator
      • Keras callbacks
      • Logging
      • Models
      • Text Generation
      • ONNX
      • Optimization
      • Model outputs
      • Pipelines
      • Processors
      • Quantization
      • Tokenizer
      • Trainer
      • DeepSpeed Integration
      • Feature Extractor
      • Image Processor
    • 🌍MODELS
      • 🌍TEXT MODELS
        • ALBERT
        • BART
        • BARThez
        • BARTpho
        • BERT
        • BertGeneration
        • BertJapanese
        • Bertweet
        • BigBird
        • BigBirdPegasus
        • BioGpt
        • Blenderbot
        • Blenderbot Small
        • BLOOM
        • BORT
        • ByT5
        • CamemBERT
        • CANINE
        • CodeGen
        • CodeLlama
        • ConvBERT
        • CPM
        • CPMANT
        • CTRL
        • DeBERTa
        • DeBERTa-v2
        • DialoGPT
        • DistilBERT
        • DPR
        • ELECTRA
        • Encoder Decoder Models
        • ERNIE
        • ErnieM
        • ESM
        • Falcon
        • FLAN-T5
        • FLAN-UL2
        • FlauBERT
        • FNet
        • FSMT
        • Funnel Transformer
        • GPT
        • GPT Neo
        • GPT NeoX
        • GPT NeoX Japanese
        • GPT-J
        • GPT2
        • GPTBigCode
        • GPTSAN Japanese
        • GPTSw3
        • HerBERT
        • I-BERT
        • Jukebox
        • LED
        • LLaMA
        • LLama2
        • Longformer
        • LongT5
        • LUKE
        • M2M100
        • MarianMT
        • MarkupLM
        • MBart and MBart-50
        • MEGA
        • MegatronBERT
        • MegatronGPT2
        • Mistral
        • mLUKE
        • MobileBERT
        • MPNet
        • MPT
        • MRA
        • MT5
        • MVP
        • NEZHA
        • NLLB
        • NLLB-MoE
        • NystrΓΆmformer
        • Open-Llama
        • OPT
        • Pegasus
        • PEGASUS-X
        • Persimmon
        • PhoBERT
        • PLBart
        • ProphetNet
        • QDQBert
        • RAG
        • REALM
        • Reformer
        • RemBERT
        • RetriBERT
        • RoBERTa
        • RoBERTa-PreLayerNorm
        • RoCBert
        • RoFormer
        • RWKV
        • Splinter
        • SqueezeBERT
        • SwitchTransformers
        • T5
        • T5v1.1
        • TAPEX
        • Transformer XL
        • UL2
        • UMT5
        • X-MOD
        • XGLM
        • XLM
        • XLM-ProphetNet
        • XLM-RoBERTa
        • XLM-RoBERTa-XL
        • XLM-V
        • XLNet
        • YOSO
      • 🌍VISION MODELS
        • BEiT
        • BiT
        • Conditional DETR
        • ConvNeXT
        • ConvNeXTV2
        • CvT
        • Deformable DETR
        • DeiT
        • DETA
        • DETR
        • DiNAT
        • DINO V2
        • DiT
        • DPT
        • EfficientFormer
        • EfficientNet
        • FocalNet
        • GLPN
        • ImageGPT
        • LeViT
        • Mask2Former
        • MaskFormer
        • MobileNetV1
        • MobileNetV2
        • MobileViT
        • MobileViTV2
        • NAT
        • PoolFormer
        • Pyramid Vision Transformer (PVT)
        • RegNet
        • ResNet
        • SegFormer
        • SwiftFormer
        • Swin Transformer
        • Swin Transformer V2
        • Swin2SR
        • Table Transformer
        • TimeSformer
        • UperNet
        • VAN
        • VideoMAE
        • Vision Transformer (ViT)
        • ViT Hybrid
        • ViTDet
        • ViTMAE
        • ViTMatte
        • ViTMSN
        • ViViT
        • YOLOS
      • 🌍AUDIO MODELS
        • Audio Spectrogram Transformer
        • Bark
        • CLAP
        • EnCodec
        • Hubert
        • MCTCT
        • MMS
        • MusicGen
        • Pop2Piano
        • SEW
        • SEW-D
        • Speech2Text
        • Speech2Text2
        • SpeechT5
        • UniSpeech
        • UniSpeech-SAT
        • VITS
        • Wav2Vec2
        • Wav2Vec2-Conformer
        • Wav2Vec2Phoneme
        • WavLM
        • Whisper
        • XLS-R
        • XLSR-Wav2Vec2
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
        • Decision Transformer
        • Trajectory Transformer
      • 🌍TIME SERIES MODELS
        • Autoformer
        • Informer
        • Time Series Transformer
      • 🌍GRAPH MODELS
        • Graphormer
  • 🌍INTERNAL HELPERS
    • Custom Layers and Utilities
    • Utilities for pipelines
    • Utilities for Tokenizers
    • Utilities for Trainer
    • Utilities for Generation
    • Utilities for Image Processors
    • Utilities for Audio processing
    • General Utilities
    • Utilities for Time Series
Powered by GitBook
On this page
  • Feature Extractor
  • FeatureExtractionMixin
  • SequenceFeatureExtractor
  • BatchFeature
  • ImageFeatureExtractionMixin
  1. API
  2. MAIN CLASSES

Feature Extractor

PreviousDeepSpeed IntegrationNextImage Processor

Last updated 1 year ago

Feature Extractor

A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images e.g. cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow tensors.

FeatureExtractionMixin

class transformers.FeatureExtractionMixin

( **kwargs )

This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.

from_pretrained

( pretrained_model_name_or_path: typing.Union[str, os.PathLike]cache_dir: typing.Union[str, os.PathLike, NoneType] = Noneforce_download: bool = Falselocal_files_only: bool = Falsetoken: typing.Union[bool, str, NoneType] = Nonerevision: str = 'main'**kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) β€” This can be either:

    • a string, the model id of a pretrained feature_extractor hosted inside a model repo on boincai.com. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.

    • a path to a directory containing a feature extractor file saved using the method, e.g., ./my_model_directory/.

    • a path or url to a saved feature extractor JSON file, e.g., ./my_model_directory/preprocessor_config.json.

  • cache_dir (str or os.PathLike, optional) β€” Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.

  • force_download (bool, optional, defaults to False) β€” Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.

  • resume_download (bool, optional, defaults to False) β€” Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.

  • proxies (Dict[str, str], optional) β€” A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.

  • token (str or bool, optional) β€” The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running boincai-cli login (stored in ~/.boincai).

  • revision (str, optional, defaults to "main") β€” The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on boincai.co, so revision can be any identifier allowed by git.

Examples:

Copied

# We can't instantiate directly the base class *FeatureExtractionMixin* nor *SequenceFeatureExtractor* so let's show the examples on a
# derived class: *Wav2Vec2FeatureExtractor*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h"
)  # Download feature_extraction_config from boincai.com and cache.
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "./test/saved_model/"
)  # E.g. feature_extractor (or model) was saved using *save_pretrained('./test/saved_model/')*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("./test/saved_model/preprocessor_config.json")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False
)
assert feature_extractor.return_attention_mask is False
feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False, return_unused_kwargs=True
)
assert feature_extractor.return_attention_mask is False
assert unused_kwargs == {"foo": False}

save_pretrained

( save_directory: typing.Union[str, os.PathLike]push_to_hub: bool = False**kwargs )

Parameters

  • save_directory (str or os.PathLike) β€” Directory where the feature extractor JSON file will be saved (will be created if it does not exist).

  • push_to_hub (bool, optional, defaults to False) β€” Whether or not to push your model to the BOINC AI model hub after saving it. You can specify the repository you want to push to with repo_id (will default to the name of save_directory in your namespace).

SequenceFeatureExtractor

class transformers.SequenceFeatureExtractor

( feature_size: intsampling_rate: intpadding_value: float**kwargs )

Parameters

  • feature_size (int) β€” The feature dimension of the extracted features.

  • sampling_rate (int) β€” The sampling rate at which the audio files should be digitalized expressed in hertz (Hz).

  • padding_value (float) β€” The value that is used to fill the padding values / vectors.

This is a general feature extraction class for speech recognition.

pad

( processed_features: typing.Union[transformers.feature_extraction_utils.BatchFeature, typing.List[transformers.feature_extraction_utils.BatchFeature], typing.Dict[str, transformers.feature_extraction_utils.BatchFeature], typing.Dict[str, typing.List[transformers.feature_extraction_utils.BatchFeature]], typing.List[typing.Dict[str, transformers.feature_extraction_utils.BatchFeature]]]padding: typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = Truemax_length: typing.Optional[int] = Nonetruncation: bool = Falsepad_to_multiple_of: typing.Optional[int] = Nonereturn_attention_mask: typing.Optional[bool] = Nonereturn_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

Parameters

  • Instead of List[float] you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors), see the note above for the return type.

    • True or 'longest': Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).

    • 'max_length': Pad to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided.

    • False or 'do_not_pad' (default): No padding (i.e., can output a batch with sequences of different lengths).

  • max_length (int, optional) β€” Maximum length of the returned list and optionally padding length (see above).

  • truncation (bool) β€” Activates truncation to cut input sequences longer than max_length to max_length.

  • pad_to_multiple_of (int, optional) β€” If set will pad the sequence to a multiple of the provided value.

    This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.

  • return_attention_mask (bool, optional) β€” Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor’s default.

    • 'tf': Return TensorFlow tf.constant objects.

    • 'pt': Return PyTorch torch.Tensor objects.

    • 'np': Return Numpy np.ndarray objects.

Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.

Padding side (left/right) padding values are defined at the feature extractor level (with self.padding_side, self.padding_value)

If the processed_features passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the result will use the same type unless you provide a different tensor type with return_tensors. In the case of PyTorch tensors, you will lose the specific device of your tensors however.

BatchFeature

class transformers.BatchFeature

( data: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonetensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )

Parameters

  • data (dict) β€” Dictionary of lists/arrays/tensors returned by the call/pad methods (β€˜input_values’, β€˜attention_mask’, etc.).

  • tensor_type (Union[None, str, TensorType], optional) β€” You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.

This class is derived from a python dictionary and can be used as a dictionary.

convert_to_tensors

( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

Parameters

Convert the inner content to tensors.

to

Parameters

  • args (Tuple) β€” Will be passed to the to(...) function of the tensors.

  • kwargs (Dict, optional) β€” Will be passed to the to(...) function of the tensors.

Returns

The same instance after modification.

Send all values to device by calling v.to(*args, **kwargs) (PyTorch only). This should support casting in different dtypes and sending the BatchFeature to a different device.

ImageFeatureExtractionMixin

class transformers.ImageFeatureExtractionMixin

( )

Mixin that contain utilities for preparing image features.

center_crop

( imagesize ) β†’ new_image

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor of shape (n_channels, height, width) or (height, width, n_channels)) β€” The image to resize.

  • size (int or Tuple[int, int]) β€” The size to which crop the image.

Returns

new_image

A center cropped PIL.Image.Image or np.ndarray or torch.Tensor of shape: (n_channels, height, width).

Crops image to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).

convert_rgb

( image )

Parameters

  • image (PIL.Image.Image) β€” The image to convert.

Converts PIL.Image.Image to RGB format.

expand_dims

( image )

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor) β€” The image to expand.

Expands 2-dimensional image to 3 dimensions.

flip_channel_order

( image )

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor) β€” The image whose color channels to flip. If np.ndarray or torch.Tensor, the channel dimension should be first.

Flips the channel order of image from RGB to BGR, or vice versa. Note that this will trigger a conversion of image to a NumPy array if it’s a PIL Image.

normalize

( imagemeanstdrescale = False )

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor) β€” The image to normalize.

  • mean (List[float] or np.ndarray or torch.Tensor) β€” The mean (per channel) to use for normalization.

  • std (List[float] or np.ndarray or torch.Tensor) β€” The standard deviation (per channel) to use for normalization.

  • rescale (bool, optional, defaults to False) β€” Whether or not to rescale the image to be between 0 and 1. If a PIL image is provided, scaling will happen automatically.

Normalizes image with mean and std. Note that this will trigger a conversion of image to a NumPy array if it’s a PIL Image.

rescale

( image: ndarrayscale: typing.Union[float, int] )

Rescale a numpy image by scale amount

resize

( imagesizeresample = Nonedefault_to_square = Truemax_size = None ) β†’ image

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor) β€” The image to resize.

  • size (int or Tuple[int, int]) β€” The size to use for resizing the image. If size is a sequence like (h, w), output size will be matched to this.

    If size is an int and default_to_square is True, then image will be resized to (size, size). If size is an int and default_to_square is False, then smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).

  • resample (int, optional, defaults to PILImageResampling.BILINEAR) β€” The filter to user for resampling.

  • max_size (int, optional, defaults to None) β€” The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater than max_size after being resized according to size, then the image is resized again so that the longer edge is equal to max_size. As a result, size might be overruled, i.e the smaller edge may be shorter than size. Only used if default_to_square is False.

Returns

image

A resized PIL.Image.Image.

Resizes image. Enforces conversion of input to PIL.Image.

rotate

( imageangleresample = Noneexpand = 0center = Nonetranslate = Nonefillcolor = None ) β†’ image

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor) β€” The image to rotate. If np.ndarray or torch.Tensor, will be converted to PIL.Image.Image before rotating.

Returns

image

A rotated PIL.Image.Image.

Returns a rotated copy of image. This method returns a copy of image, rotated the given number of degrees counter clockwise around its centre.

to_numpy_array

( imagerescale = Nonechannel_first = True )

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor) β€” The image to convert to a NumPy array.

  • rescale (bool, optional) β€” Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default to True if the image is a PIL Image or an array/tensor of integers, False otherwise.

  • channel_first (bool, optional, defaults to True) β€” Whether or not to permute the dimensions of the image to put the channel dimension first.

Converts image to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.

to_pil_image

( imagerescale = None )

Parameters

  • image (PIL.Image.Image or numpy.ndarray or torch.Tensor) β€” The image to convert to the PIL Image format.

  • rescale (bool, optional) β€” Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default to True if the image type is a floating type, False otherwise.

Converts image to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.

Instantiate a type of from a feature extractor, e.g. a derived class of .

kwargs (Dict[str, Any], optional) β€” Additional key word arguments passed along to the method.

Save a feature_extractor object to the directory save_directory, so that it can be re-loaded using the class method.

processed_features (, list of , Dict[str, List[float]], Dict[str, List[List[float]] or List[Dict[str, List[float]]]) β€” Processed inputs. Can represent one input ( or Dict[str, List[float]]) or a batch of input values / vectors (list of , Dict[str, List[List[float]]] or List[Dict[str, List[float]]]) so you can use this method during preprocessing as well as in a PyTorch Dataloader collate function.

padding (bool, str or , optional, defaults to True) β€” Select a strategy to pad the returned sequences (according to the model’s padding side and padding index) among:

return_tensors (str or , optional) β€” If set, will return tensors instead of list of python integers. Acceptable values are:

Holds the output of the and feature extractor specific __call__ methods.

tensor_type (str or , optional) β€” The type of tensors to use. If str, should be one of the values of the enum . If None, no modification is done.

( *args**kwargs ) β†’

default_to_square (bool, optional, defaults to True) β€” How to convert size when it is a single int. If set to True, the size will be converted to a square (size,size). If set to False, will replicate with support for resizing only the smallest edge and providing an optional max_size.

🌍
🌍
<source>
<source>
save_pretrained()
FeatureExtractionMixin
SequenceFeatureExtractor
<source>
push_to_hub()
from_pretrained()
<source>
<source>
BatchFeature
BatchFeature
BatchFeature
BatchFeature
PaddingStrategy
What are attention masks?
TensorType
<source>
pad()
<source>
TensorType
TensorType
<source>
BatchFeature
BatchFeature
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
torchvision.transforms.Resize
<source>
<source>
<source>