This guide helps you migrate your code from previous versions of the Aurelio SDK to v0.0.19, which introduces significant changes to the extract
endpoint.
quality
parameterThe quality
parameter previously used with values "low"
and "high"
for both PDF and MP4 file extraction has been deprecated.
model
parameterThe new model
parameter replaces quality
and provides more granular control over the extraction process with different model options.
For MP4 files, chunking preferences previously set through the quality
parameter are now explicitly configured in the processing_options
parameter.
File Type | Old Approach | New Approach | Notes |
---|---|---|---|
quality="low" | model="aurelio-base" | Fastest, cheapest option for clean PDFs | |
quality="high" | model="docling-base" | Code-based OCR method for high precision | |
- | model="gemini-2-flash-lite" | New! State-of-the-art VLM-based extraction | |
MP4 | quality="low" , chunk=True | model="aurelio-base" , processing_options={"chunking": {"chunker_type": "regex"}} | Basic chunking for videos |
MP4 | quality="high" , chunk=True | model="aurelio-base" , processing_options={"chunking": {"chunker_type": "semantic"}} | Semantic chunking for videos |
The changes for extract_url
are identical to those for extract_file
- replace the quality
parameter with the appropriate model
parameter, and for videos, specify chunking preferences in processing_options
.
The new gemini-2-flash-lite
model uses a Vision Language Model to process PDF content, offering state-of-the-art accuracy. This can be especially valuable for:
Note: As mentioned in the OCR in Large Multimodal Models paper, VLMs like Gemini can occasionally hallucinate content. While hallucinations are rare, the model is designed for high-recall, potentially with lower precision than code-based OCR methods.
aurelio-base
pricing is equivalent to the old low
quality settingdocling-base
and gemini-2-flash-lite
are priced equivalent to the old high
quality settingThe quality
parameter will continue to work during a transition period but will be removed in a future release. We recommend updating your code to use the new model
parameter as soon as possible.