Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Accurate PDF Extraction, Low Cost
Demonstrates a lesser-known Chinese PDF content extraction model with evaluation, error analysis, setup code, and sglang optimizations that achieve high accuracy and 20× speedups.
We needed higher accuracy and lower cost when extracting the contents of a PDF into a structured format. After evaluating many open-source and commercial solutions, we’ve found one that achieves extremely high accuracy and low cost. But, it is not well known and have very little community surrounding it, so it required extra work to get it working.
We’ll give a brief overview of document content extraction models, explain our evaluation, then show specific accuracy errors and our evaluation results. Our presentation will conclude with showing code for how we first got the model working, and then working quickly (20x faster) using sglang.