Processors

processors

Processors are used to prepare non-textual inputs (e.g., image or audio) for a model.

Example: Using a WhisperProcessor to prepare an audio input for a model.

Copied

import { AutoProcessor, read_audio } from '@xenova/transformers';

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');
let audio = await read_audio('https://boincai.com/datasets/Narsil/asr_dummy/resolve/main/mlk.flac', 16000);
let { input_features } = await processor(audio);
// Tensor {
//   data: Float32Array(240000) [0.4752984642982483, 0.5597258806228638, 0.56434166431427, ...],
//   dims: [1, 80, 3000],
//   type: 'float32',
//   size: 240000,
// }

processors
- static
  - .FeatureExtractor ⇐ Callable
    new FeatureExtractor(config)
  - .ImageFeatureExtractor ⇐ FeatureExtractor
    new ImageFeatureExtractor(config)
    .thumbnail(image, size, [resample]) ⇒ Promise.<RawImage>
    .preprocess(image) ⇒ Promise.<PreprocessedImage>
    ._call(images, ...args) ⇒ Promise.<ImageFeatureExtractorResult>
  - .DetrFeatureExtractor ⇐ ImageFeatureExtractor
    ._call(urls) ⇒ Promise.<DetrFeatureExtractorResult>
    .post_process_object_detection() : post_process_object_detection
    .remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ *
    .check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ *
    .compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ *
    .post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>
  - .Processor ⇐ Callable
    new Processor(feature_extractor)
    ._call(input, ...args) ⇒ Promise.<any>
  - .WhisperProcessor ⇐ Processor
    ._call(audio) ⇒ Promise.<any>
  - .AutoProcessor
    .from_pretrained(pretrained_model_name_or_path, options) ⇒ Promise.<Processor>
- inner
  - ~center_to_corners_format(arr) ⇒ Array.<number>
  - ~post_process_object_detection(outputs) ⇒ Array.<Object>
    ~box : Array.<number>
  - ~HeightWidth : *
  - ~ImageFeatureExtractorResult : object
  - ~PreprocessedImage : object
  - ~DetrFeatureExtractorResult : object
  - ~SamImageProcessorResult : object

processors.FeatureExtractor ⇐ <code> Callable </code>

Base class for feature extractors.

Kind: static class of processors Extends: Callable

new FeatureExtractor(config)

Constructs a new FeatureExtractor instance.

Param

Type

Description

config

Object

The configuration for the feature extractor.

processors.ImageFeatureExtractor ⇐ <code> FeatureExtractor </code>

Feature extractor for image models.

Kind: static class of processors Extends: FeatureExtractor

.ImageFeatureExtractor ⇐ FeatureExtractor
- new ImageFeatureExtractor(config)
- .thumbnail(image, size, [resample]) ⇒ Promise.<RawImage>
- .preprocess(image) ⇒ Promise.<PreprocessedImage>
- ._call(images, ...args) ⇒ Promise.<ImageFeatureExtractorResult>

new ImageFeatureExtractor(config)

Constructs a new ImageFeatureExtractor instance.

Param

Type

Description

config

Object

The configuration for the feature extractor.

config.image_mean

Array.<number>

The mean values for image normalization.

config.image_std

Array.<number>

The standard deviation values for image normalization.

config.do_rescale

boolean

Whether to rescale the image pixel values to the [0,1] range.

config.rescale_factor

number

The factor to use for rescaling the image pixel values.

config.do_normalize

boolean

Whether to normalize the image pixel values.

config.do_resize

boolean

Whether to resize the image.

config.resample

number

What method to use for resampling.

config.size

number

The size to resize the image to.

imageFeatureExtractor.thumbnail(image, size, [resample]) ⇒ <code> Promise. < RawImage > </code>

Resize the image to make a thumbnail. The image is resized so that no dimension is larger than any corresponding dimension of the specified size.

Kind: instance method of ImageFeatureExtractor Returns: Promise.<RawImage> - The resized image.

Param

Type

Default

Description

image

RawImage

The image to be resized.

size

Object

The size {"height": h, "width": w} to resize the image to.

[resample]

string | 0 | 1 | 2 | 3 | 4 | 5

2

The resampling filter to use.

imageFeatureExtractor.preprocess(image) ⇒ <code> Promise. < PreprocessedImage > </code>

Preprocesses the given image.

Kind: instance method of ImageFeatureExtractor Returns: Promise.<PreprocessedImage> - The preprocessed image.

Param

Type

Description

image

RawImage

The image to preprocess.

imageFeatureExtractor._call(images, ...args) ⇒ <code> Promise. < ImageFeatureExtractorResult > </code>

Calls the feature extraction process on an array of image URLs, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of ImageFeatureExtractor Returns: Promise.<ImageFeatureExtractorResult> - An object containing the concatenated pixel values (and other metadata) of the preprocessed images.

Param

Type

Description

images

Array.<any>

The URL(s) of the image(s) to extract features from.

...args

any

Additional arguments.

processors.DetrFeatureExtractor ⇐ <code> ImageFeatureExtractor </code>

Detr Feature Extractor.

Kind: static class of processors Extends: ImageFeatureExtractor

.DetrFeatureExtractor ⇐ ImageFeatureExtractor
- ._call(urls) ⇒ Promise.<DetrFeatureExtractorResult>
- .post_process_object_detection() : post_process_object_detection
- .remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ *
- .check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ *
- .compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ *
- .post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>

detrFeatureExtractor._call(urls) ⇒ <code> Promise. < DetrFeatureExtractorResult > </code>

Calls the feature extraction process on an array of image URLs, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of DetrFeatureExtractor Returns: Promise.<DetrFeatureExtractorResult> - An object containing the concatenated pixel values of the preprocessed images.

Param

Type

Description

urls

Array.<any>

The URL(s) of the image(s) to extract features from.

detrFeatureExtractor.post_process_object_detection() : <code> post_process_object_detection </code>

Kind: instance method of DetrFeatureExtractor

detrFeatureExtractor.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ <code> * </code>

Binarize the given masks using object_mask_threshold, it returns the associated values of masks, scores and labels.

Kind: instance method of DetrFeatureExtractor Returns: * - The binarized masks, the scores, and the labels.

Param

Type

Description

class_logits

Tensor

The class logits.

mask_logits

Tensor

The mask logits.

object_mask_threshold

number

A number between 0 and 1 used to binarize the masks.

num_labels

number

The number of labels.

detrFeatureExtractor.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ <code> * </code>

Checks whether the segment is valid or not.

Kind: instance method of DetrFeatureExtractor Returns: * - Whether the segment is valid or not, and the indices of the valid labels.

Param

Type

Default

Description

mask_labels

Int32Array

Labels for each pixel in the mask.

mask_probs

Array.<Tensor>

Probabilities for each pixel in the masks.

number

The class id of the segment.

mask_threshold

number

0.5

The mask threshold.

overlap_mask_area_threshold

number

0.8

The overlap mask area threshold.

detrFeatureExtractor.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ <code> * </code>

Computes the segments.

Kind: instance method of DetrFeatureExtractor Returns: * - The computed segments.

Param

Type

Default

Description

mask_probs

Array.<Tensor>

The mask probabilities.

pred_scores

Array.<number>

The predicted scores.

pred_labels

Array.<number>

The predicted labels.

mask_threshold

number

The mask threshold.

overlap_mask_area_threshold

number

The overlap mask area threshold.

label_ids_to_fuse

Set.<number>

The label ids to fuse.

target_size

Array.<number>

The target size of the image.

detrFeatureExtractor.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ <code> Array. < {segmentation: Tensor, segments_info: Array < {id: number, label_id: number, score: number} > } > </code>

Post-process the model output to generate the final panoptic segmentation.

Kind: instance method of DetrFeatureExtractor

Param

Type

Default

Description

outputs

*

The model output to post process

[threshold]

number

0.5

The probability score threshold to keep predicted instance masks.

[mask_threshold]

number

0.5

Threshold to use when turning the predicted masks into binary values.

[overlap_mask_area_threshold]

number

0.8

The overlap mask area threshold to merge or discard small disconnected parts within each binary instance mask.

[label_ids_to_fuse]

Set.<number>

The labels in this state will have all their instances be fused together.

[target_sizes]

Array.<Array<number>>

The target sizes to resize the masks to.

processors.Processor ⇐ <code> Callable </code>

Represents a Processor that extracts features from an input.

Kind: static class of processors Extends: Callable

.Processor ⇐ Callable
- new Processor(feature_extractor)
- ._call(input, ...args) ⇒ Promise.<any>

new Processor(feature_extractor)

Creates a new Processor with the given feature extractor.

Param

Type

Description

feature_extractor

FeatureExtractor

The function used to extract features from the input.

processor._call(input, ...args) ⇒ <code> Promise. < any > </code>

Calls the feature_extractor function with the given input.

Kind: instance method of Processor Returns: Promise.<any> - A Promise that resolves with the extracted features.

Param

Type

Description

input

any

The input to extract features from.

...args

any

Additional arguments.

processors.WhisperProcessor ⇐ <code> Processor </code>

Represents a WhisperProcessor that extracts features from an audio input.

Kind: static class of processors Extends: Processor

whisperProcessor._call(audio) ⇒ <code> Promise. < any > </code>

Calls the feature_extractor function with the given audio input.

Kind: instance method of WhisperProcessor Returns: Promise.<any> - A Promise that resolves with the extracted features.

Param

Type

Description

audio

any

The audio input to extract features from.

processors.AutoProcessor

Helper class which is used to instantiate pretrained processors with the from_pretrained function. The chosen processor class is determined by the type specified in the processor config.

Example: Load a processor using from_pretrained.

Copied

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');

Example: Run an image through a processor.

Copied

let processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
let image = await RawImage.read('https://boincai.com/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// {
//   "pixel_values": {
//     "dims": [ 1, 3, 224, 224 ],
//     "type": "float32",
//     "data": Float32Array [ -1.558687686920166, -1.558687686920166, -1.5440893173217773, ... ],
//     "size": 150528
//   },
//   "original_sizes": [
//     [ 533, 800 ]
//   ],
//   "reshaped_input_sizes": [
//     [ 224, 224 ]
//   ]
// }

Kind: static class of processors

AutoProcessor.from_pretrained(pretrained_model_name_or_path, options) ⇒ <code> Promise. < Processor > </code>

Instantiate one of the processor classes of the library from a pretrained model.

The processor class to instantiate is selected based on the feature_extractor_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible)

Kind: static method of AutoProcessor Returns: Promise.<Processor> - A new instance of the Processor class.

Param

Type

Description

pretrained_model_name_or_path

string

The name or path of the pretrained model. Can be either:

A string, the model id of a pretrained processor hosted inside a model repo on boincai.com. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
A path to a directory containing processor files, e.g., ./my_model_directory/.

options

*

Additional options for loading the processor.

processors~center_to_corners_format(arr) ⇒ <code> Array. < number > </code>

Converts bounding boxes from center format to corners format.

Kind: inner method of processors Returns: Array.<number> - The coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y)

Param

Type

Description

arr

Array.<number>

The coordinate for the center of the box and its width, height dimensions (center_x, center_y, width, height)

processors~post_process_object_detection(outputs) ⇒ <code> Array. < Object > </code>

Post-processes the outputs of the model (for object detection).

Kind: inner method of processors Returns: Array.<Object> - An array of objects containing the post-processed outputs.

Param

Type

Description

outputs

Object

The outputs of the model that must be post-processed

outputs.logits

Tensor

The logits

outputs.pred_boxes

Tensor

The predicted boxes.

post_process_object_detection~box : <code> Array. < number > </code>

Kind: inner property of post_process_object_detection

processors~HeightWidth : <code> * </code>

Named tuple to indicate the order we are using is (height x width), even though the Graphics’ industry standard is (width x height).

Kind: inner typedef of processors

processors~ImageFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors Properties

Name

Type

Description

pixel_values

Tensor

The pixel values of the batched preprocessed images.

original_sizes

Array.<HeightWidth>

Array of two-dimensional tuples like [[480, 640]].

reshaped_input_sizes

Array.<HeightWidth>

Array of two-dimensional tuples like [[1000, 1330]].

processors~PreprocessedImage : <code> object </code>

Kind: inner typedef of processors Properties

Name

Type

Description

original_size

HeightWidth

The original size of the image.

reshaped_input_size

HeightWidth

The reshaped input size of the image.

pixel_values

Tensor

The pixel values of the preprocessed image.

processors~DetrFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors Properties

Name

Type

pixel_mask

Tensor

processors~SamImageProcessorResult : <code> object </code>

Kind: inner typedef of processors Properties

Name

Type

pixel_values

Tensor

original_sizes

Array.<HeightWidth>

reshaped_input_sizes

Array.<HeightWidth>

input_points

Tensor

PreviousTokenizers NextConfigs

Last updated 1 year ago