Transformers
  • 🌍GET STARTED
    • Transformers
    • Quick tour
    • Installation
  • 🌍TUTORIALS
    • Run inference with pipelines
    • Write portable code with AutoClass
    • Preprocess data
    • Fine-tune a pretrained model
    • Train with a script
    • Set up distributed training with BOINC AI Accelerate
    • Load and train adapters with BOINC AI PEFT
    • Share your model
    • Agents
    • Generation with LLMs
  • 🌍TASK GUIDES
    • 🌍NATURAL LANGUAGE PROCESSING
      • Text classification
      • Token classification
      • Question answering
      • Causal language modeling
      • Masked language modeling
      • Translation
      • Summarization
      • Multiple choice
    • 🌍AUDIO
      • Audio classification
      • Automatic speech recognition
    • 🌍COMPUTER VISION
      • Image classification
      • Semantic segmentation
      • Video classification
      • Object detection
      • Zero-shot object detection
      • Zero-shot image classification
      • Depth estimation
    • 🌍MULTIMODAL
      • Image captioning
      • Document Question Answering
      • Visual Question Answering
      • Text to speech
    • 🌍GENERATION
      • Customize the generation strategy
    • 🌍PROMPTING
      • Image tasks with IDEFICS
  • 🌍DEVELOPER GUIDES
    • Use fast tokenizers from BOINC AI Tokenizers
    • Run inference with multilingual models
    • Use model-specific APIs
    • Share a custom model
    • Templates for chat models
    • Run training on Amazon SageMaker
    • Export to ONNX
    • Export to TFLite
    • Export to TorchScript
    • Benchmarks
    • Notebooks with examples
    • Community resources
    • Custom Tools and Prompts
    • Troubleshoot
  • 🌍PERFORMANCE AND SCALABILITY
    • Overview
    • 🌍EFFICIENT TRAINING TECHNIQUES
      • Methods and tools for efficient training on a single GPU
      • Multiple GPUs and parallelism
      • Efficient training on CPU
      • Distributed CPU training
      • Training on TPUs
      • Training on TPU with TensorFlow
      • Training on Specialized Hardware
      • Custom hardware for training
      • Hyperparameter Search using Trainer API
    • 🌍OPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • 🌍CONTRIBUTE
    • How to contribute to transformers?
    • How to add a model to BOINC AI Transformers?
    • How to convert a BOINC AI Transformers model to TensorFlow?
    • How to add a pipeline to BOINC AI Transformers?
    • Testing
    • Checks on a Pull Request
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Glossary
    • What BOINC AI Transformers can do
    • How BOINC AI Transformers solve tasks
    • The Transformer model family
    • Summary of the tokenizers
    • Attention mechanisms
    • Padding and truncation
    • BERTology
    • Perplexity of fixed-length models
    • Pipelines for webserver inference
    • Model training anatomy
  • 🌍API
    • 🌍MAIN CLASSES
      • Agents and Tools
      • 🌍Auto Classes
        • Extending the Auto Classes
        • AutoConfig
        • AutoTokenizer
        • AutoFeatureExtractor
        • AutoImageProcessor
        • AutoProcessor
        • Generic model classes
          • AutoModel
          • TFAutoModel
          • FlaxAutoModel
        • Generic pretraining classes
          • AutoModelForPreTraining
          • TFAutoModelForPreTraining
          • FlaxAutoModelForPreTraining
        • Natural Language Processing
          • AutoModelForCausalLM
          • TFAutoModelForCausalLM
          • FlaxAutoModelForCausalLM
          • AutoModelForMaskedLM
          • TFAutoModelForMaskedLM
          • FlaxAutoModelForMaskedLM
          • AutoModelForMaskGenerationge
          • TFAutoModelForMaskGeneration
          • AutoModelForSeq2SeqLM
          • TFAutoModelForSeq2SeqLM
          • FlaxAutoModelForSeq2SeqLM
          • AutoModelForSequenceClassification
          • TFAutoModelForSequenceClassification
          • FlaxAutoModelForSequenceClassification
          • AutoModelForMultipleChoice
          • TFAutoModelForMultipleChoice
          • FlaxAutoModelForMultipleChoice
          • AutoModelForNextSentencePrediction
          • TFAutoModelForNextSentencePrediction
          • FlaxAutoModelForNextSentencePrediction
          • AutoModelForTokenClassification
          • TFAutoModelForTokenClassification
          • FlaxAutoModelForTokenClassification
          • AutoModelForQuestionAnswering
          • TFAutoModelForQuestionAnswering
          • FlaxAutoModelForQuestionAnswering
          • AutoModelForTextEncoding
          • TFAutoModelForTextEncoding
        • Computer vision
          • AutoModelForDepthEstimation
          • AutoModelForImageClassification
          • TFAutoModelForImageClassification
          • FlaxAutoModelForImageClassification
          • AutoModelForVideoClassification
          • AutoModelForMaskedImageModeling
          • TFAutoModelForMaskedImageModeling
          • AutoModelForObjectDetection
          • AutoModelForImageSegmentation
          • AutoModelForImageToImage
          • AutoModelForSemanticSegmentation
          • TFAutoModelForSemanticSegmentation
          • AutoModelForInstanceSegmentation
          • AutoModelForUniversalSegmentation
          • AutoModelForZeroShotImageClassification
          • TFAutoModelForZeroShotImageClassification
          • AutoModelForZeroShotObjectDetection
        • Audio
          • AutoModelForAudioClassification
          • AutoModelForAudioFrameClassification
          • TFAutoModelForAudioFrameClassification
          • AutoModelForCTC
          • AutoModelForSpeechSeq2Seq
          • TFAutoModelForSpeechSeq2Seq
          • FlaxAutoModelForSpeechSeq2Seq
          • AutoModelForAudioXVector
          • AutoModelForTextToSpectrogram
          • AutoModelForTextToWaveform
        • Multimodal
          • AutoModelForTableQuestionAnswering
          • TFAutoModelForTableQuestionAnswering
          • AutoModelForDocumentQuestionAnswering
          • TFAutoModelForDocumentQuestionAnswering
          • AutoModelForVisualQuestionAnswering
          • AutoModelForVision2Seq
          • TFAutoModelForVision2Seq
          • FlaxAutoModelForVision2Seq
      • Callbacks
      • Configuration
      • Data Collator
      • Keras callbacks
      • Logging
      • Models
      • Text Generation
      • ONNX
      • Optimization
      • Model outputs
      • Pipelines
      • Processors
      • Quantization
      • Tokenizer
      • Trainer
      • DeepSpeed Integration
      • Feature Extractor
      • Image Processor
    • 🌍MODELS
      • 🌍TEXT MODELS
        • ALBERT
        • BART
        • BARThez
        • BARTpho
        • BERT
        • BertGeneration
        • BertJapanese
        • Bertweet
        • BigBird
        • BigBirdPegasus
        • BioGpt
        • Blenderbot
        • Blenderbot Small
        • BLOOM
        • BORT
        • ByT5
        • CamemBERT
        • CANINE
        • CodeGen
        • CodeLlama
        • ConvBERT
        • CPM
        • CPMANT
        • CTRL
        • DeBERTa
        • DeBERTa-v2
        • DialoGPT
        • DistilBERT
        • DPR
        • ELECTRA
        • Encoder Decoder Models
        • ERNIE
        • ErnieM
        • ESM
        • Falcon
        • FLAN-T5
        • FLAN-UL2
        • FlauBERT
        • FNet
        • FSMT
        • Funnel Transformer
        • GPT
        • GPT Neo
        • GPT NeoX
        • GPT NeoX Japanese
        • GPT-J
        • GPT2
        • GPTBigCode
        • GPTSAN Japanese
        • GPTSw3
        • HerBERT
        • I-BERT
        • Jukebox
        • LED
        • LLaMA
        • LLama2
        • Longformer
        • LongT5
        • LUKE
        • M2M100
        • MarianMT
        • MarkupLM
        • MBart and MBart-50
        • MEGA
        • MegatronBERT
        • MegatronGPT2
        • Mistral
        • mLUKE
        • MobileBERT
        • MPNet
        • MPT
        • MRA
        • MT5
        • MVP
        • NEZHA
        • NLLB
        • NLLB-MoE
        • NystrΓΆmformer
        • Open-Llama
        • OPT
        • Pegasus
        • PEGASUS-X
        • Persimmon
        • PhoBERT
        • PLBart
        • ProphetNet
        • QDQBert
        • RAG
        • REALM
        • Reformer
        • RemBERT
        • RetriBERT
        • RoBERTa
        • RoBERTa-PreLayerNorm
        • RoCBert
        • RoFormer
        • RWKV
        • Splinter
        • SqueezeBERT
        • SwitchTransformers
        • T5
        • T5v1.1
        • TAPEX
        • Transformer XL
        • UL2
        • UMT5
        • X-MOD
        • XGLM
        • XLM
        • XLM-ProphetNet
        • XLM-RoBERTa
        • XLM-RoBERTa-XL
        • XLM-V
        • XLNet
        • YOSO
      • 🌍VISION MODELS
        • BEiT
        • BiT
        • Conditional DETR
        • ConvNeXT
        • ConvNeXTV2
        • CvT
        • Deformable DETR
        • DeiT
        • DETA
        • DETR
        • DiNAT
        • DINO V2
        • DiT
        • DPT
        • EfficientFormer
        • EfficientNet
        • FocalNet
        • GLPN
        • ImageGPT
        • LeViT
        • Mask2Former
        • MaskFormer
        • MobileNetV1
        • MobileNetV2
        • MobileViT
        • MobileViTV2
        • NAT
        • PoolFormer
        • Pyramid Vision Transformer (PVT)
        • RegNet
        • ResNet
        • SegFormer
        • SwiftFormer
        • Swin Transformer
        • Swin Transformer V2
        • Swin2SR
        • Table Transformer
        • TimeSformer
        • UperNet
        • VAN
        • VideoMAE
        • Vision Transformer (ViT)
        • ViT Hybrid
        • ViTDet
        • ViTMAE
        • ViTMatte
        • ViTMSN
        • ViViT
        • YOLOS
      • 🌍AUDIO MODELS
        • Audio Spectrogram Transformer
        • Bark
        • CLAP
        • EnCodec
        • Hubert
        • MCTCT
        • MMS
        • MusicGen
        • Pop2Piano
        • SEW
        • SEW-D
        • Speech2Text
        • Speech2Text2
        • SpeechT5
        • UniSpeech
        • UniSpeech-SAT
        • VITS
        • Wav2Vec2
        • Wav2Vec2-Conformer
        • Wav2Vec2Phoneme
        • WavLM
        • Whisper
        • XLS-R
        • XLSR-Wav2Vec2
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
        • Decision Transformer
        • Trajectory Transformer
      • 🌍TIME SERIES MODELS
        • Autoformer
        • Informer
        • Time Series Transformer
      • 🌍GRAPH MODELS
        • Graphormer
  • 🌍INTERNAL HELPERS
    • Custom Layers and Utilities
    • Utilities for pipelines
    • Utilities for Tokenizers
    • Utilities for Trainer
    • Utilities for Generation
    • Utilities for Image Processors
    • Utilities for Audio processing
    • General Utilities
    • Utilities for Time Series
Powered by GitBook
On this page
  • Utilities for Image Processors
  • Image Transformations
  • ImageProcessingMixin
  1. INTERNAL HELPERS

Utilities for Image Processors

PreviousUtilities for GenerationNextUtilities for Audio processing

Last updated 1 year ago

Utilities for Image Processors

This page lists all the utility functions used by the image processors, mainly the functional transformations used to process the images.

Most of those are only useful if you are studying the code of the image processors in the library.

Image Transformations

transformers.image_transforms.center_crop

( image: ndarraysize: typing.Tuple[int, int]data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = Noneinput_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = Nonereturn_numpy: typing.Optional[bool] = None ) β†’ np.ndarray

Parameters

  • image (np.ndarray) β€” The image to crop.

  • size (Tuple[int, int]) β€” The target size for the cropped image.

  • data_format (str or ChannelDimension, optional) β€” The channel dimension format for the output image. Can be one of:

    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.

    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format. If unset, will use the inferred format of the input image.

  • input_data_format (str or ChannelDimension, optional) β€” The channel dimension format for the input image. Can be one of:

    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.

    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format. If unset, will use the inferred format of the input image.

  • return_numpy (bool, optional) β€” Whether or not to return the cropped image as a numpy array. Used for backwards compatibility with the previous ImageFeatureExtractionMixin method.

    • Unset: will return the same type as the input image.

    • True: will return a numpy array.

    • False: will return a PIL.Image.Image object.

Returns

np.ndarray

The cropped image.

Crops the image to the specified size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result will always be of size size).

transformers.image_transforms.center_to_corners_format

( bboxes_center: TensorType )

Converts bounding boxes from center format to corners format.

center format: contains the coordinate for the center of the box and its width, height dimensions (center_x, center_y, width, height) corners format: contains the coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y)

transformers.image_transforms.corners_to_center_format

( bboxes_corners: TensorType )

Converts bounding boxes from corners format to center format.

corners format: contains the coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y) center format: contains the coordinate for the center of the box and its the width, height dimensions (center_x, center_y, width, height)

transformers.image_transforms.id_to_rgb

( id_map )

Converts unique ID to RGB color.

transformers.image_transforms.normalize

( image: ndarraymean: typing.Union[float, typing.Iterable[float]]std: typing.Union[float, typing.Iterable[float]]data_format: typing.Optional[transformers.image_utils.ChannelDimension] = Noneinput_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None )

Parameters

  • image (np.ndarray) β€” The image to normalize.

  • mean (float or Iterable[float]) β€” The mean to use for normalization.

  • std (float or Iterable[float]) β€” The standard deviation to use for normalization.

  • data_format (ChannelDimension, optional) β€” The channel dimension format of the output image. If unset, will use the inferred format from the input.

  • input_data_format (ChannelDimension, optional) β€” The channel dimension format of the input image. If unset, will use the inferred format from the input.

Normalizes image using the mean and standard deviation specified by mean and std.

image = (image - mean) / std

transformers.image_transforms.pad

( image: ndarraypadding: typing.Union[int, typing.Tuple[int, int], typing.Iterable[typing.Tuple[int, int]]]mode: PaddingMode = <PaddingMode.CONSTANT: 'constant'>constant_values: typing.Union[float, typing.Iterable[float]] = 0.0data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = Noneinput_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) β†’ np.ndarray

Parameters

  • image (np.ndarray) β€” The image to pad.

  • padding (int or Tuple[int, int] or Iterable[Tuple[int, int]]) β€” Padding to apply to the edges of the height, width axes. Can be one of three formats:

    • ((before_height, after_height), (before_width, after_width)) unique pad widths for each axis.

    • ((before, after),) yields same before and after pad for height and width.

    • (pad,) or int is a shortcut for before = after = pad width for all axes.

  • mode (PaddingMode) β€” The padding mode to use. Can be one of:

    • "constant": pads with a constant value.

    • "reflect": pads with the reflection of the vector mirrored on the first and last values of the vector along each axis.

    • "replicate": pads with the replication of the last value on the edge of the array along each axis.

    • "symmetric": pads with the reflection of the vector mirrored along the edge of the array.

  • constant_values (float or Iterable[float], optional) β€” The value to use for the padding if mode is "constant".

  • data_format (str or ChannelDimension, optional) β€” The channel dimension format for the output image. Can be one of:

    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.

    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format. If unset, will use same as the input image.

  • input_data_format (str or ChannelDimension, optional) β€” The channel dimension format for the input image. Can be one of:

    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.

    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format. If unset, will use the inferred format of the input image.

Returns

np.ndarray

The padded image.

Pads the image with the specified (height, width) padding and mode.

transformers.image_transforms.rgb_to_id

( color )

Converts RGB color to unique ID.

transformers.image_transforms.rescale

( image: ndarrayscale: floatdata_format: typing.Optional[transformers.image_utils.ChannelDimension] = Nonedtype: dtype = <class 'numpy.float32'>input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) β†’ np.ndarray

Parameters

  • image (np.ndarray) β€” The image to rescale.

  • scale (float) β€” The scale to use for rescaling the image.

  • data_format (ChannelDimension, optional) β€” The channel dimension format of the image. If not provided, it will be the same as the input image.

  • dtype (np.dtype, optional, defaults to np.float32) β€” The dtype of the output image. Defaults to np.float32. Used for backwards compatibility with feature extractors.

  • input_data_format (ChannelDimension, optional) β€” The channel dimension format of the input image. If not provided, it will be inferred from the input image.

Returns

np.ndarray

The rescaled image.

Rescales image by scale.

transformers.image_transforms.resize

( imagesize: typing.Tuple[int, int]resample: PILImageResampling = Nonereducing_gap: typing.Optional[int] = Nonedata_format: typing.Optional[transformers.image_utils.ChannelDimension] = Nonereturn_numpy: bool = Trueinput_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) β†’ np.ndarray

Parameters

  • image (PIL.Image.Image or np.ndarray or torch.Tensor) β€” The image to resize.

  • size (Tuple[int, int]) β€” The size to use for resizing the image.

  • resample (int, optional, defaults to PILImageResampling.BILINEAR) β€” The filter to user for resampling.

  • reducing_gap (int, optional) β€” Apply optimization by resizing the image in two steps. The bigger reducing_gap, the closer the result to the fair resampling. See corresponding Pillow documentation for more details.

  • data_format (ChannelDimension, optional) β€” The channel dimension format of the output image. If unset, will use the inferred format from the input.

  • return_numpy (bool, optional, defaults to True) β€” Whether or not to return the resized image as a numpy array. If False a PIL.Image.Image object is returned.

  • input_data_format (ChannelDimension, optional) β€” The channel dimension format of the input image. If unset, will use the inferred format from the input.

Returns

np.ndarray

The resized image.

Resizes image to (height, width) specified by size using the PIL library.

transformers.image_transforms.to_pil_image

( image: typing.Union[numpy.ndarray, ForwardRef('PIL.Image.Image'), ForwardRef('torch.Tensor'), ForwardRef('tf.Tensor'), ForwardRef('jnp.ndarray')]do_rescale: typing.Optional[bool] = Noneinput_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) β†’ PIL.Image.Image

Parameters

  • image (PIL.Image.Image or numpy.ndarray or torch.Tensor or tf.Tensor) β€” The image to convert to the PIL.Image format.

  • do_rescale (bool, optional) β€” Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default to True if the image type is a floating type and casting to int would result in a loss of precision, and False otherwise.

  • input_data_format (ChannelDimension, optional) β€” The channel dimension format of the input image. If unset, will use the inferred format from the input.

Returns

PIL.Image.Image

The converted image.

Converts image to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.

ImageProcessingMixin

class transformers.ImageProcessingMixin

( **kwargs )

This is an image processor mixin used to provide saving/loading functionality for sequential and image feature extractors.

fetch_images

( image_url_or_urls: typing.Union[str, typing.List[str]] )

Convert a single or a list of urls into the corresponding PIL.Image objects.

If a single url is passed, the return value will be a single object. If a list is passed a list of objects is returned.

from_dict

Parameters

  • kwargs (Dict[str, Any]) β€” Additional parameters from which to initialize the image processor object.

Returns

The image processor object instantiated from those parameters.

from_json_file

Parameters

  • json_file (str or os.PathLike) β€” Path to the JSON file containing the parameters.

Returns

The image_processor object instantiated from that JSON file.

from_pretrained

( pretrained_model_name_or_path: typing.Union[str, os.PathLike]cache_dir: typing.Union[str, os.PathLike, NoneType] = Noneforce_download: bool = Falselocal_files_only: bool = Falsetoken: typing.Union[bool, str, NoneType] = Nonerevision: str = 'main'**kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) β€” This can be either:

    • a string, the model id of a pretrained image_processor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.

    • a path or url to a saved image processor JSON file, e.g., ./my_model_directory/preprocessor_config.json.

  • cache_dir (str or os.PathLike, optional) β€” Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used.

  • force_download (bool, optional, defaults to False) β€” Whether or not to force to (re-)download the image processor files and override the cached versions if they exist.

  • resume_download (bool, optional, defaults to False) β€” Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.

  • proxies (Dict[str, str], optional) β€” A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.

  • token (str or bool, optional) β€” The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).

  • revision (str, optional, defaults to "main") β€” The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

Examples:

Copied

# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32"
)  # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
    "./test/saved_model/"
)  # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}

get_image_processor_dict

( pretrained_model_name_or_path: typing.Union[str, os.PathLike]**kwargs ) β†’ Tuple[Dict, Dict]

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) β€” The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.

  • subfolder (str, optional, defaults to "") β€” In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here.

Returns

Tuple[Dict, Dict]

The dictionary(ies) that will be used to instantiate the image processor object.

From a pretrained_model_name_or_path, resolve to a dictionary of parameters, to be used for instantiating a image processor of type ~image_processor_utils.ImageProcessingMixin using from_dict.

push_to_hub

( repo_id: struse_temp_dir: typing.Optional[bool] = Nonecommit_message: typing.Optional[str] = Noneprivate: typing.Optional[bool] = Nonetoken: typing.Union[bool, str, NoneType] = Nonemax_shard_size: typing.Union[int, str, NoneType] = '10GB'create_pr: bool = Falsesafe_serialization: bool = Falserevision: str = None**deprecated_kwargs )

Parameters

  • repo_id (str) β€” The name of the repository you want to push your image processor to. It should contain your organization name when pushing to a given organization.

  • use_temp_dir (bool, optional) β€” Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default to True if there is no directory named like repo_id, False otherwise.

  • commit_message (str, optional) β€” Message to commit while pushing. Will default to "Upload image processor".

  • private (bool, optional) β€” Whether or not the repository created should be private.

  • token (bool or str, optional) β€” The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running huggingface-cli login (stored in ~/.huggingface). Will default to True if repo_url is not specified.

  • max_shard_size (int or str, optional, defaults to "10GB") β€” Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB").

  • create_pr (bool, optional, defaults to False) β€” Whether or not to create a PR with the uploaded files or directly commit.

  • safe_serialization (bool, optional, defaults to False) β€” Whether or not to convert the model weights in safetensors format for safer serialization.

  • revision (str, optional) β€” Branch to push the uploaded files to.

Upload the image processor file to the BOINC AI Model Hub.

Examples:

Copied

from transformers import AutoImageProcessor

image processor = AutoImageProcessor.from_pretrained("bert-base-cased")

# Push the image processor to your namespace with the name "my-finetuned-bert".
image processor.push_to_hub("my-finetuned-bert")

# Push the image processor to an organization with the name "my-finetuned-bert".
image processor.push_to_hub("huggingface/my-finetuned-bert")

register_for_auto_class

( auto_class = 'AutoImageProcessor' )

Parameters

  • auto_class (str or type, optional, defaults to "AutoImageProcessor ") β€” The auto class to register this new image processor with.

Register this class with a given auto class. This should only be used for custom image processors as the ones in the library are already mapped with AutoImageProcessor .

This API is experimental and may have some slight breaking changes in the next releases.

save_pretrained

( save_directory: typing.Union[str, os.PathLike]push_to_hub: bool = False**kwargs )

Parameters

  • save_directory (str or os.PathLike) β€” Directory where the image processor JSON file will be saved (will be created if it does not exist).

  • push_to_hub (bool, optional, defaults to False) β€” Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to with repo_id (will default to the name of save_directory in your namespace).

to_dict

( ) β†’ Dict[str, Any]

Returns

Dict[str, Any]

Dictionary of all the attributes that make up this image processor instance.

Serializes this instance to a Python dictionary.

to_json_file

( json_file_path: typing.Union[str, os.PathLike] )

Parameters

  • json_file_path (str or os.PathLike) β€” Path to the JSON file in which this image_processor instance’s parameters will be saved.

Save this instance to a JSON file.

to_json_string

( ) β†’ str

Returns

str

String containing all the attributes that make up this feature_extractor instance in JSON format.

Serializes this instance to a JSON string.

( image_processor_dict: typing.Dict[str, typing.Any]**kwargs ) β†’

image_processor_dict (Dict[str, Any]) β€” Dictionary that will be used to instantiate the image processor object. Such a dictionary can be retrieved from a pretrained checkpoint by leveraging the method.

Instantiates a type of from a Python dictionary of parameters.

( json_file: typing.Union[str, os.PathLike] ) β†’ A image processor of type

A image processor of type

Instantiates a image processor of type from the path to a JSON file of parameters.

a path to a directory containing a image processor file saved using the method, e.g., ./my_model_directory/.

Instantiate a type of from an image processor.

kwargs (Dict[str, Any], optional) β€” Additional key word arguments passed along to the method.

Save an image processor object to the directory save_directory, so that it can be re-loaded using the class method.

🌍
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
ImageProcessingMixin
to_dict()
ImageProcessingMixin
ImageProcessingMixin
<source>
ImageProcessingMixin
ImageProcessingMixin
ImageProcessingMixin
<source>
save_pretrained()
ImageProcessingMixin
<source>
<source>
<source>
<source>
push_to_hub()
from_pretrained()
<source>
<source>
<source>