Embeddings
This guide provides detailed technical information about embedding capabilities in the Aurelio SDK. Embeddings are vector representations of text that capture semantic meaning and are essential for building text retrieval and search systems.
Embedding Flow
Embedding Options
The SDK provides a focused embedding API with the following parameters:
Parameter | Type | Default | Description |
---|---|---|---|
input | Union[str, List[str]] | Required | Text or list of texts to embed |
input_type | str | Required | Either “queries” or “documents” depending on use case |
model | str | "bm25" | Embedding model to use (currently only “bm25” is available) |
timeout | int | 30 | Maximum seconds to wait for API response |
retries | int | 3 | Number of retry attempts for failed requests |
Sparse Embeddings
The Aurelio SDK uses sparse BM25-style embeddings, which differ from traditional dense embeddings:
Aurelio Sparse Implementation
The SDK’s BM25 embedding model uses a single set of pretrained weights trained on a web-scale dataset to produce a “world model” set of BM25-like weights. These weights are transformed into sparse vector embeddings with the following characteristics:
- Structure: Each embedding contains index-value pairs, where indices represent specific terms/tokens and values represent their importance
- Sparse Representation: Only non-zero values are stored, making them memory-efficient
- Exact Term Matching: Excellent for capturing exact terminology for specialized domains
- Domain-Specific Performance: Well-suited for finance, medical, legal, and technical domains where specific terminology matters
Input Types
The input_type
parameter accepts two possible values:
Input Type | Use Case | Description |
---|---|---|
"documents" | Creating a searchable knowledge base | Optimizes embeddings for document representation in a vector database |
"queries" | Querying a knowledge base | Optimizes embeddings for query representation when searching against embedded documents |
Sparse Embedding Structure
The indices
correspond to token positions in the vocabulary, while the values
represent the importance of each token for the given text.
Usage Examples
Basic Embedding Generation
Batch Embedding Generation
Async Embedding Generation
Complete Workflow: Chunk and Embed
A common pattern is to chunk documents and then embed each chunk:
Response Structure
The embedding response contains detailed information:
The EmbeddingUsage
provides token consumption metrics:
Each embedding is contained in an EmbeddingDataObject
:
Advantages of Sparse Embeddings
Sparse vs. Dense Embeddings
Characteristic | Sparse BM25 Embeddings | Dense Embeddings |
---|---|---|
Representation | Index-value pairs for non-zero elements | Fixed-dimension vectors of continuous values |
Storage Efficiency | High (only stores non-zero values) | Low (stores all dimensions) |
Term Matching | Excellent for exact term/keyword matching | May miss exact terminology |
Domain Adaptation | Strong for specialized vocabulary domains | May require fine-tuning for domains |
Interpretability | Higher (indices correspond to vocabulary terms) | Lower (dimensions not directly interpretable) |
When to Use Sparse
Sparse BM25 embeddings excel in scenarios where:
- You need to capture domain-specific terminology (medical, finance, legal, technical)
- Exact keyword matching is important
- You want higher interpretability of search results
- You’re building systems where precision on terminology matters more than general semantic similarity
Error Handling
Future Plans
The Aurelio SDK plans to enhance embedding capabilities with:
- Additional sparse embedding models
- User-trainable models for specific domains
- Advanced embedding customization options
Stay tuned for updates to the embedding API as these features become available.