Encoder for Vision Transformer models.
This class provides functionality to encode images using a Vision Transformer model via Hugging Face. It supports various image processing and model initialization options.
Initialize the VitEncoder.
Arguments:
**data
(dict
): Additional keyword arguments for the encoder.Encode a list of images into embeddings using the Vision Transformer model.
Arguments:
imgs
(List[Any]
): The images to encode.batch_size
(int
): The batch size for encoding.Returns:
List[List[float]]
: The embeddings for the images.