int
: Vocabulary size of tokenizer
dict
: dictionary of tokenizer config
path
(str, :class:
pathlib.Path“): Path to save the tokenizer tobm25_engine.tokenizer.BaseTokenizer
object from saved configuration
Requires these files:
path
(str, :class:
pathlib.Path“): Path to load the tokenizer fromBaseTokenizer
: Configured BaseTokenizer
semantic_router.tokenizers.BaseTokenizer
class.
Arguments:
tokenizer
(class:
tokenizers.Tokenizer“): Binding for HuggingFace Rust tokenizersadd_special_tokens
(bool
): Whether to accept special tokens from the tokenizer (i.e. [PAD]
)pad
(bool
): Whether to pad the input to a consistent length (using [PAD]
tokens)tokenizer
0 (tokenizer
1): HuggingFace ID of the model (i.e. tokenizer
2)int
: Vocabulary size of tokenizer
dict
: dictionary of tokenizer config
numpy.ndarray
of token ids
Arguments:
texts
(str, list
): Texts to be tokenizedpad
(bool
): unused here (configured in the constructor)class:
numpy.ndarray“: 2D numpy array representing token ids
bm25_engine.tokenizer.BaseTokenizer
Arguments:
type_
(str
): Tokenizer type to instantiate\**kwargs
: kwargs to be passed to Tokenizer constructorbm25_engine.tokenizer.BaseTokenizer
: Tokenizer