Overview
The Aurelio SDK provides a streamlined interface to the Aurelio Platform’s document processing capabilities. It enables developers to extract, chunk, and embed textual content from various sources with minimal effort.
What is Aurelio SDK?
Aurelio SDK is a Python library that abstracts the complexity of document processing pipelines. It offers both synchronous and asynchronous clients to interact with the Aurelio Platform.
Core Capabilities
Document Extraction
Extract text from multiple sources including:
- PDF documents (local files or URLs)
- Video files with automatic transcription
- Web-based content
Intelligent Chunking
Break down documents into meaningful segments using:
- Semantic chunking that respects content boundaries
- Configurable parameters for chunk size and overlap
- Window-based processing for context preservation
Embeddings Generation
Transform text into vector representations using:
- Multiple embedding models including BM25
- Batch processing for efficiency
- Consistent vector formats for downstream applications
When to Use Aurelio SDK
Aurelio SDK is particularly useful when:
- Building document processing pipelines that require extraction and structuring of content
- Implementing semantic search capabilities across large document collections
- Preparing text data for large language model applications
- Creating NLP workflows that need consistent text chunking and embedding
Architecture
The SDK follows a client-based architecture:
This structure allows for clean separation of concerns, with the SDK handling authentication, request formatting, and response parsing, letting you focus on your application logic.
Getting Started
To start using the SDK, continue to Quickstart Guide for installation instructions and basic usage examples.