VitEncoder Objects

class VitEncoder(DenseEncoder)

Encoder for Vision Transformer models.

This class provides functionality to encode images using a Vision Transformer model via Hugging Face. It supports various image processing and model initialization options.

__init__

def __init__(**data)

Initialize the VitEncoder.

Arguments:

  • **data (dict): Additional keyword arguments for the encoder.

__call__

def __call__(imgs: List[Any], batch_size: int = 32) -> List[List[float]]

Encode a list of images into embeddings using the Vision Transformer model.

Arguments:

  • imgs (List[Any]): The images to encode.
  • batch_size (int): The batch size for encoding.

Returns:

List[List[float]]: The embeddings for the images.