encode_queries
and encode_documents
, with the latter to be stored in VDBencode_queries(q)
and encode_documents([D_0, D_1, ...])
is the BM25 score of the documents [D_0, D_1, ...]
for the given query q
k1
(float
): normalizer parameter that limits how much a single query term q_i ∈ q
can affect score for document D_n
encode_documents
0 (float
): normalizer parameter that balances the effect of a single document length compared to the average document lengthencode_documents
2 (encode_documents
3): number of documents in the trained corpusencode_documents
4 (encode_documents
5): float representing the average document length in the trained corpusencode_documents
6 (encode_documents
7numpy.ndarrayencode_documents
8): (1, tokenizer.vocab_size) shaped array, denoting how many documents contain encode_documents
9routes
(List[Route]
): List of routes to train the encoder on.queries
(list
): List of queries to encodelist[SparseEmbedding]
: BM25 scores for each query against the corpus
d_i ∈ D
|D| is the document length
avgdl is average document length in trained corpus
Arguments:
documents
(list
): List of queries to encodelist[SparseEmbedding]
: Encoded queries (as either sparse or dict)
docs
: List of documents to encodeis_query
: If True, use query encoding, else use document encoding