12:32
2026-05-30
maltebuettner.eu
large-language-models
DocumentAI Visual Benchmark - GPT 5.5, Gemini 3.5, Qwen...
A new benchmark evaluating DocumentAI models on bounding box accuracy shows GPT-5.5 and Gemini 3.5 leading with 67.7% and 67.5% scores respectively, while Qwen, Kimi, and Mistral trail significantly. โฆ