Parameter | Type | Default | Description |
---|---|---|---|
model | "aurelio-base" | "docling-base" | "gemini-2-flash-lite" | "aurelio-base" | Model to use for processing. Different models have different capabilities and price points. |
chunk | bool | True | Whether to chunk the document using default chunking config. |
wait | int | 30 | Time in seconds to wait for processing completion. Set to -1 to wait indefinitely. Set to 0 to return immediately with a document ID. |
polling_interval | int | 5 | Time in seconds between status check requests. Set to 0 to disable polling. |
retries | int | 3 | Number of retry attempts in case of API errors (5xx). |
processing_options | dict | None | Additional processing options for customizing extraction and chunking behavior. |
Note: Thequality
parameter has been deprecated and replaced with themodel
parameter.
- For PDF:
quality="low"
is equivalent tomodel="aurelio-base"
(fastest, cheapest, best for clean PDFs)- For PDF:
quality="high"
is equivalent tomodel="docling-base"
(code-based OCR for high precision)- For PDF: A new option
model="gemini-2-flash-lite"
uses a Vision Language Model (VLM) for state-of-the-art text extraction. Note that VLMs can offer superior PDF-to-text performance but come with the risk of hallucinating PDF content Y. Liu, et al.- For MP4: Both quality settings used
"aurelio-base"
but with different chunking methods, now specified inprocessing_options
- MP4 files can only be processed with
model="aurelio-base"
model="aurelio-base"
for faster processing of simple documents (equivalent to old quality="low"
)model="docling-base"
for complex documents with tables, diagrams, or mixed layouts (equivalent to old quality="high"
)model="gemini-2-flash-lite"
for state-of-the-art text extraction using a Vision Language Modelwait
time or using -1
model="aurelio-base"
is supported for video transcriptionprocessing_options
(use “chunker_type”: “semantic” for better chunking, equivalent to old quality="high"
)wait=-1
for videos longer than 5 minutespolling_interval
(15-30 seconds) for videos to reduce API callsmodel="docling-base"
to better preserve page structuremodel="aurelio-base"
is supportedwait=0
):
wait=-1
):
polling_interval
to control how frequently to check statuswait=30
):
ExtractResponse
object contains detailed information about the processed document:
ResponseDocument
contains:
APITimeoutError
: Raised when the request exceeds the wait timeAPIError
: General API error with details in the messageApiRateLimitError
: Raised when API rate limits are exceeded