This guide provides detailed technical information about embedding capabilities in the Aurelio SDK. Embeddings are vector representations of text that capture semantic meaning and are essential for building text retrieval and search systems.

Embedding Flow

Input Text

Preprocessing

BM25 Embedding Model

Sparse Vector Generation

Final Embeddings

Embedding Options

The SDK provides a focused embedding API with the following parameters:

def embedding(
    self,
    input: Union[str, List[str]],
    input_type: Annotated[str, Literal["queries", "documents"]],
    model: Annotated[str, Literal["bm25"]],
    timeout: int = 30,
    retries: int = 3,
) -> EmbeddingResponse:
    """Generate embeddings for the given input using the specified model."""
ParameterTypeDefaultDescription
inputUnion[str, List[str]]RequiredText or list of texts to embed
input_typestrRequiredEither “queries” or “documents” depending on use case
modelstr"bm25"Embedding model to use (currently only “bm25” is available)
timeoutint30Maximum seconds to wait for API response
retriesint3Number of retry attempts for failed requests

Sparse Embeddings

The Aurelio SDK uses sparse BM25-style embeddings, which differ from traditional dense embeddings:

Sparse BM25 Embedding

Input Text

BM25 Model

Sparse Vector with Indices & Values

Dense Embedding

Input Text

Dense Model

Fixed-Dimension Dense Vector

Aurelio Sparse Implementation

The SDK’s BM25 embedding model uses a single set of pretrained weights trained on a web-scale dataset to produce a “world model” set of BM25-like weights. These weights are transformed into sparse vector embeddings with the following characteristics:

  • Structure: Each embedding contains index-value pairs, where indices represent specific terms/tokens and values represent their importance
  • Sparse Representation: Only non-zero values are stored, making them memory-efficient
  • Exact Term Matching: Excellent for capturing exact terminology for specialized domains
  • Domain-Specific Performance: Well-suited for finance, medical, legal, and technical domains where specific terminology matters

Input Types

The input_type parameter accepts two possible values:

Input TypeUse CaseDescription
"documents"Creating a searchable knowledge baseOptimizes embeddings for document representation in a vector database
"queries"Querying a knowledge baseOptimizes embeddings for query representation when searching against embedded documents

Sparse Embedding Structure

class SparseEmbedding(BaseModel):
    indices: list[int]
    values: list[float]

The indices correspond to token positions in the vocabulary, while the values represent the importance of each token for the given text.

Usage Examples

Basic Embedding Generation

from aurelio_sdk import AurelioClient

client = AurelioClient(api_key="your_api_key")

# Embedding a single text
response = client.embedding(
    input="What is the capital of France?", 
    input_type="queries",
    model="bm25"
)

# Accessing the embedding
embedding = response.data[0].embedding
print(f"Indices: {embedding.indices[:5]}...")
print(f"Values: {embedding.values[:5]}...")

Batch Embedding Generation

# Embedding multiple documents at once
documents = [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Rome is the capital of Italy."
]

response = client.embedding(
    input=documents,
    input_type="documents",
    model="bm25"
)

# Process each embedding
for i, item in enumerate(response.data):
    embedding = item.embedding
    print(f"Document {i}: {len(embedding.indices)} non-zero elements")

Async Embedding Generation

from aurelio_sdk import AsyncAurelioClient
import asyncio

async def generate_embeddings():
    client = AsyncAurelioClient(api_key="your_api_key")
    
    response = await client.embedding(
        input="Async embedding generation", 
        input_type="documents",
        model="bm25"
    )
    
    return response

embeddings = asyncio.run(generate_embeddings())

Complete Workflow: Chunk and Embed

A common pattern is to chunk documents and then embed each chunk:

# 1. Extract and chunk a document
extract_response = client.extract_file(
    file_path="document.pdf", 
    quality="high",
    chunk=True
)

# 2. Get chunks from the document
chunks = [chunk.content for chunk in extract_response.document.chunks]

# 3. Generate embeddings for all chunks
embedding_response = client.embedding(
    input=chunks,
    input_type="documents",
    model="bm25"
)

# Now you can store these embeddings in a vector database
for i, chunk in enumerate(extract_response.document.chunks):
    embedding = embedding_response.data[i].embedding
    # Store chunk ID, content, and embedding in your vector store

Response Structure

The embedding response contains detailed information:

class EmbeddingResponse(BaseModel):
    message: Optional[str]
    model: str      # The model used (e.g., "bm25")
    object: str     # Always "list"
    usage: EmbeddingUsage
    data: list[EmbeddingDataObject]

The EmbeddingUsage provides token consumption metrics:

class EmbeddingUsage(BaseModel):
    prompt_tokens: int
    total_tokens: int

Each embedding is contained in an EmbeddingDataObject:

class EmbeddingDataObject(BaseModel):
    object: str     # Always "embedding"
    index: int      # Position in the input array
    embedding: SparseEmbedding

Advantages of Sparse Embeddings

Sparse vs. Dense Embeddings

CharacteristicSparse BM25 EmbeddingsDense Embeddings
RepresentationIndex-value pairs for non-zero elementsFixed-dimension vectors of continuous values
Storage EfficiencyHigh (only stores non-zero values)Low (stores all dimensions)
Term MatchingExcellent for exact term/keyword matchingMay miss exact terminology
Domain AdaptationStrong for specialized vocabulary domainsMay require fine-tuning for domains
InterpretabilityHigher (indices correspond to vocabulary terms)Lower (dimensions not directly interpretable)

When to Use Sparse

Yes

No

Medical/Legal/Technical

General Text

Choose Embedding Type

Need Exact Term Matching?

Use Sparse BM25

Consider Dense Embeddings

Application Domain

Sparse BM25 Recommended

Either May Work

Sparse BM25 embeddings excel in scenarios where:

  • You need to capture domain-specific terminology (medical, finance, legal, technical)
  • Exact keyword matching is important
  • You want higher interpretability of search results
  • You’re building systems where precision on terminology matters more than general semantic similarity

Error Handling

from aurelio_sdk import AurelioClient, ApiError, ApiTimeoutError

client = AurelioClient(api_key="your_api_key")

try:
    response = client.embedding(
        input="Sample text", 
        input_type="documents",
        model="bm25"
    )
except ApiTimeoutError:
    print("Request timed out, try increasing the timeout parameter")
except ApiError as e:
    print(f"Error: {e.message}")

Future Plans

The Aurelio SDK plans to enhance embedding capabilities with:

  • Additional sparse embedding models
  • User-trainable models for specific domains
  • Advanced embedding customization options

Stay tuned for updates to the embedding API as these features become available.