This page covers the fundamental concepts of the Aurelio SDK, explaining its key components and how they work together to provide powerful document processing capabilities.
Document processing in the Aurelio SDK converts unstructured documents (PDFs) into easily readable markdown that can be further processed or used in AI applications.The processing pipeline:
Ingestion: Documents are uploaded either as local files or via URLs.
Quality Selection: Processing can be done in different quality modes:
low: Faster but less accurate
high: More accurate but slower
Text Extraction: The system identifies and extracts text content
Structure Recognition: Identifies document elements like headers, paragraphs, tables
Metadata Extraction: Retrieves document metadata when available
Copy
Ask AI
# Example of document processingresponse = client.extract_file( file_path="document.pdf", quality="high", wait=30)
Chunking is the process of breaking long documents into smaller, semantically meaningful pieces that are optimized for downstream tasks like embedding and retrieval.
Embeddings are dense vector representations of text that capture semantic meaning in a form that machines can process efficiently. They enable semantic search, similarity comparison, and other NLP applications.
The async API provides significant performance improvements for concurrent processing scenarios, making it the preferred choice for production applications with substantial throughput requirements.