{"slug": "paddleocr-3-5-running-ocr-and-document-parsing-tasks-with-a-transformers-backend", "title": "PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend", "summary": "PaddleOCR 3.5 introduces a more flexible inference-engine interface, allowing developers to select the backend (including Transformers) via the `engine` parameter and configure backend-specific options through `engine_config`. This update enables supported PaddleOCR models to run with a Transformers backend, reducing integration friction for developers working in Hugging Face-centered environments. The release aims to simplify document ingestion for downstream applications like RAG and Document AI by providing a natural path from documents to structured data.", "body_md": "PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend\nengine=\"transformers\"\nPaddleOCR continues to provide OCR model series such as PP-OCRv5 and document parsing model series such as PaddleOCR-VL 1.5, while Transformers becomes one of the supported backends for running them.\nTry the live demo on Hugging Face Spaces: https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo\nWhat changed?\nPaddleOCR 3.5 introduces a more flexible inference-engine interface. Developers can select the backend through the engine\nparameter and pass backend-specific options through engine_config\n.\nIn practice, this means:\n- The pipelines behind these tasks are managed by PaddleOCR, so developers do not need to manually call each internal component.\n- Transformers becomes one of the supported inference backends for running supported PaddleOCR models.\n- Developers can configure backend-related options such as\ndtype\n, device placement, and attention implementation throughengine_config\n.\nA simple way to understand the stack:\nThis release is mainly about the inference backend layer: PaddleOCR continues to provide OCR and document parsing capabilities, while Transformers gives supported PaddleOCR models another backend option that fits naturally into Hugging Face-centered environments. The larger Document AI workflow remains in the hands of developers and application builders.\nWhy this matters\nFor RAG, Document AI, and document agent applications, the hard part often starts before the LLM.\nDevelopers first need to turn PDFs, scanned documents, screenshots, tables, charts, formulas, and complex page layouts into reliable structured data. If this ingestion step is weak, the downstream LLM workflow may miss key information, retrieve the wrong context, or produce unreliable answers.\nPaddleOCR helps address this document ingestion challenge by providing OCR series models such as PP-OCRv5 and document parsing series models such as PaddleOCR-VL-1.5.\nWith PaddleOCR 3.5, these capabilities are now easier to connect with Transformers-centered stacks. Supported PaddleOCR models can run with a Transformers backend, while PaddleOCR continues to manage the OCR or document parsing pipeline behind the scenes.\nFor developers, this means less integration friction and a more natural path from documents to downstream RAG, agent, search, analytics, or automation workflows.\nQuick start\nInstall PaddleOCR 3.5, PaddleX, Transformers, and a compatible PyTorch build for your hardware.\nFor example, on a CUDA 12.6 environment:\npython -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126\npython -m pip install \"paddleocr==3.5.0\" \"paddlex==3.5.2\" \"transformers>=5.4.0\"\nFor CPU, ROCm, or other environments, install the PyTorch build that matches your target hardware.\nRun from the command line:\npaddleocr ocr \\\n-i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png \\\n--device gpu:0 \\\n--engine transformers\nOr use the Python API:\nfrom paddleocr import PaddleOCR\npipeline = PaddleOCR(\ndevice=\"gpu:0\",\nengine=\"transformers\",\nuse_doc_orientation_classify=False,\nuse_doc_unwarping=False,\nuse_textline_orientation=False,\nengine_config={\n\"dtype\": \"float32\",\n},\n)\nresults = pipeline.predict(\n\"https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png\"\n)\nfor result in results:\nprint(result)\nThe Hugging Face Space uses float32\nfor broad compatibility. For your own hardware, you can tune backend-specific options through engine_config\n:\nengine_config = {\n\"dtype\": \"bfloat16\",\n\"device_type\": \"gpu\",\n\"device_id\": 0,\n\"attn_implementation\": \"sdpa\",\n}\nThe best configuration depends on your model, hardware, and deployment environment.\nWhen should you use the Transformers backend?\nUse the Transformers backend when you want PaddleOCR’s OCR and document parsing capabilities to fit more naturally into a Hugging Face-centered stack.\nThis is especially useful if you are building RAG, Document AI, search, analytics, or agent applications and already rely on PyTorch / Transformers infrastructure for model loading, experimentation, deployment, or model artifact management.\nThe Transformers backend is a good fit when you want:\n- a more familiar development experience for teams already using Transformers,\n- Hub-compatible model discovery and distribution for supported PaddleOCR models,\n- easier integration with existing PyTorch / Transformers services.\nWhen maximizing OCR or document parsing throughput is the priority, PaddleOCR’s default paddle_static\nbackend is usually the recommended choice.\nThis release is not about replacing one backend with another. It is about giving developers more flexibility: use PaddleOCR for OCR and document parsing capabilities, and choose the inference backend that best fits your stack.\nTry it now\nTry the PaddleOCR 3.5 Transformers demo on Hugging Face Spaces:\nhttps://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo\nExplore PaddleOCR models on the Hub:\nhttps://huggingface.co/PaddlePaddle/models\nPaddleOCR 3.5 brings OCR and document parsing capabilities closer to Transformers-centered workflows, while giving developers the freedom to build the larger Document AI applications around them.\nResources\n- PaddleOCR documentation: https://www.paddleocr.ai/\n- PaddleOCR on GitHub: https://github.com/PaddlePaddle/PaddleOCR\n- PaddlePaddle organization on Hugging Face: https://huggingface.co/PaddlePaddle\n- PaddleOCR 3.5 Transformers demo on Spaces: https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo\nAcknowledgements\nWe sincerely thank the Hugging Face engineers who supported the PaddleOCR 3.5 Transformers integration.\nSpecial thanks to Anton Vlasjuk for his end-to-end involvement, including reviewing and merging all related pull requests.\nWe also appreciate Raushan Turganbay and Yoni Gozlan for their valuable PR reviews and feedback.\nTheir guidance helped improve the integration quality, documentation, and developer experience for the Hugging Face community.", "url": "https://wpnews.pro/news/paddleocr-3-5-running-ocr-and-document-parsing-tasks-with-a-transformers-backend", "canonical_source": "https://huggingface.co/blog/PaddlePaddle/paddleocr-transformers", "published_at": "2026-05-18 15:12:46+00:00", "updated_at": "2026-05-19 21:56:14.915379+00:00", "lang": "en", "topics": ["machine-learning", "open-source", "developer-tools", "products", "data"], "entities": ["PaddleOCR", "Transformers", "Hugging Face", "PP-OCRv5", "PaddleOCR-VL 1.5"], "alternates": {"html": "https://wpnews.pro/news/paddleocr-3-5-running-ocr-and-document-parsing-tasks-with-a-transformers-backend", "markdown": "https://wpnews.pro/news/paddleocr-3-5-running-ocr-and-document-parsing-tasks-with-a-transformers-backend.md", "text": "https://wpnews.pro/news/paddleocr-3-5-running-ocr-and-document-parsing-tasks-with-a-transformers-backend.txt", "jsonld": "https://wpnews.pro/news/paddleocr-3-5-running-ocr-and-document-parsing-tasks-with-a-transformers-backend.jsonld"}}