VitEncoder Objects

class VitEncoder(DenseEncoder)
Encoder for Vision Transformer models. This class provides functionality to encode images using a Vision Transformer model via Hugging Face. It supports various image processing and model initialization options.

__init__

def __init__(**data)
Initialize the VitEncoder. Arguments:
  • **data (dict): Additional keyword arguments for the encoder.

__call__

def __call__(imgs: List[Any], batch_size: int = 32) -> List[List[float]]
Encode a list of images into embeddings using the Vision Transformer model. Arguments:
  • imgs (List[Any]): The images to encode.
  • batch_size (int): The batch size for encoding.
Returns: List[List[float]]: The embeddings for the images.