What is Aurelio SDK?
Aurelio SDK is a Python library that abstracts the complexity of document processing pipelines. It offers both synchronous and asynchronous clients to interact with the Aurelio Platform.Core Capabilities
Document Extraction
Extract text from multiple sources including:- PDF documents (local files or URLs)
- Video files with automatic transcription
- Web-based content
Intelligent Chunking
Break down documents into meaningful segments using:- Semantic chunking that respects content boundaries
- Configurable parameters for chunk size and overlap
- Window-based processing for context preservation
Embeddings Generation
Transform text into vector representations using:- Multiple embedding models including BM25
- Batch processing for efficiency
- Consistent vector formats for downstream applications
When to Use Aurelio SDK
Aurelio SDK is particularly useful when:- Building document processing pipelines that require extraction and structuring of content
- Implementing semantic search capabilities across large document collections
- Preparing text data for large language model applications
- Creating NLP workflows that need consistent text chunking and embedding