After months of dedicated development, I’m excited to share KerasFormers : an open-source library bringing modern transformer architectures with pre-trained weights, built entirely in Keras 3. What started as a vision model collection has grown into a unified ecosystem spanning vision, language, multimodal, and speech all through a single API that runs seamlessly on TensorFlow, JAX, and PyTorch. One API. Any backend.
Key Features
• 100+ models across vision, language, multimodal & speech under one unified API
• Modern LLM architectures : Dense, Mixture-of-Experts (MoE) & Multi-head Latent Attention (MLA): Llama 2/3/4, Qwen 2/3/3.5, DeepSeek V2/V3/V4, Gemma, Mistral, Mixtral, Cohere2, GLM-4, MiniMax, GPT-OSS
• Vision-Language Models: Qwen-VL, Qwen2.5-VL, Qwen3-VL, InternVL3, Janus-Pro, Gemma 3, GLM-4V & more • Full computer vision suite: classification, detection, segmentation, depth & self-supervised learning
• One-line pre-trained from Hugging Face & timm: model = Model.from_weights(“hf:…”) • Fast, compiled .generate() with KV caching • Native Keras 3 with full multi-backend compatibility KerasFormers makes state-of-the-art models accessible through a consistent, backend-agnostic interface move seamlessly across frameworks.
pip install -U kerasformers