{"slug": "rust-port-of-transformers-1m-lines-of-code", "title": "Rust port of transformers (1M lines of code)", "summary": "TrustformeRS 0.1.1, a pure Rust port of Hugging Face Transformers with over 1.4 million lines of code, was released on April 25, 2026, delivering 49+ transformer architectures and up to 1.67x speedup over the Python original. The memory-safe implementation supports multi-backend GPU acceleration and deployment across WebAssembly, mobile, and server platforms.", "body_md": "A high-performance, memory-safe Rust implementation of Hugging Face Transformers. TrustformeRS brings the power of transformer models to the Rust ecosystem with zero-cost abstractions, fearless concurrency, and deployment flexibility from edge to cloud.\n\nProject Status: TrustformeRS 0.1.1 was released on 2026-04-25. This release delivers 49+ transformer architectures, 5,358 tests with 100% pass rate, ~1,408,134 lines of 100% Pure Rust, and full multi-platform deployment (WebAssembly, server REST/gRPC/GraphQL, mobile iOS/Android, RLHF/DPO training). Multi-backend GPU support: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU.\n\n**🏎️ Performance**: Leverages Rust's zero-cost abstractions, SIMD optimizations, and efficient memory management**🔒 Safety**: Memory-safe by design with Rust's ownership model - no more segfaults or memory leaks**📦 Portability**: Deploy anywhere from WebAssembly to embedded devices to GPU clusters**🔧 Control**: Explicit resource management following SciRS2's Core Usage Policy**🤝 Compatibility**: Loads Hugging Face model formats directly\n\n| Model | Task | TrustformeRS | HF Transformers | Speedup |\n|---|---|---|---|---|\n| BERT-base | Inference (CPU) | 23ms | 31ms | 1.35x |\n| BERT-base | Batch=32 (CPU) | 412ms | 687ms | 1.67x |\n| GPT-2 | Generation (CPU) | 89ms | 142ms | 1.59x |\n| LLaMA-7B | Generation (GPU) | 12ms/token | 18ms/token | 1.50x |\n| T5-base | Translation | 156ms | 234ms | 1.50x |\n| ViT-base | Image Classification | 15ms | 22ms | 1.47x |\n\n*Benchmarks on Intel i9-12900K (CPU) and NVIDIA RTX 4090 (GPU)*\n\nTrustformeRS follows a modular workspace structure inspired by Hugging Face Transformers:\n\n```\ntrustformers/\n├── trustformers-core/      # Core traits and tensor abstractions  (204,130 SLoC, Stable)\n├── trustformers-models/    # 49+ model implementations           (196,463 SLoC, Alpha)\n├── trustformers-tokenizers/# BPE, WordPiece, SentencePiece       ( 51,211 SLoC, Stable)\n├── trustformers-optim/     # 20+ optimizers and LR schedulers    ( 71,429 SLoC, Stable)\n├── trustformers-training/  # Distributed training, RLHF/DPO      ( 89,413 SLoC, Stable)\n├── trustformers-serve/     # REST/gRPC/GraphQL serving           (361,251 SLoC, Stable)\n├── trustformers-wasm/      # WebAssembly + WebGPU deployment     ( 55,504 SLoC, Stable)\n├── trustformers-mobile/    # iOS/Android deployment              (143,001 SLoC, Alpha)\n├── trustformers-debug/     # Profilers, visualizers, TensorBoard (101,448 SLoC, Alpha)\n└── trustformers/           # High-level integration crate        (134,295 SLoC, Alpha)\n```\n\n**Total**: ~1.4M+ SLoC, 100% Pure Rust (COOLJAPAN Policy)\n\n**Trait-based abstractions**: Models, layers, and tokenizers implement common traits for composability** Feature-gated backends**: Choose between CPU, GPU, or WebAssembly targets** Zero-copy model loading**: Memory-mapped weights with SafeTensors format** Explicit parallelism**: You control thread and GPU usage, not the library\n\n```\n[dependencies]\ntrustformers = \"0.1.1\"\nuse trustformers::prelude::*;\nuse trustformers::{AutoModel, AutoTokenizer};\n\nfn main() -> Result<(), Box<dyn std::error::Error>> {\n    // Load model and tokenizer\n    let tokenizer = AutoTokenizer::from_pretrained(\"bert-base-uncased\")?;\n    let model = AutoModel::from_pretrained(\"bert-base-uncased\")?;\n    \n    // Tokenize input\n    let inputs = tokenizer.encode(\"Hello, Rust world!\", None)?;\n    \n    // Run inference\n    let outputs = model.forward(&inputs)?;\n    \n    println!(\"Hidden states shape: {:?}\", outputs.last_hidden_state.shape());\n    Ok(())\n}\nuse trustformers::pipeline;\n\n// All pipelines are fully implemented and ready to use!\nlet classifier = pipeline(\"sentiment-analysis\")?;\nlet result = classifier(\"I love writing Rust code!\")?;\n// Output: [{ label: \"POSITIVE\", score: 0.999 }]\n\n// Also available:\n// - text-generation\n// - token-classification (NER)\n// - question-answering\n// - fill-mask\n// - summarization\n// - translation\n```\n\n| Model | Variants | Tasks |\n|---|---|---|\n| BERT | base, large | Masked LM, Classification, Token Classification, QA |\n| RoBERTa | base, large | Same as BERT |\n| DistilBERT | base | Same as BERT (faster) |\n| ALBERT | base, large | Same as BERT (parameter sharing) |\n| ELECTRA | base, large | Discriminative pretraining |\n| DeBERTa | base, large | Disentangled attention |\n\n| Model | Variants | Tasks |\n|---|---|---|\n| GPT-2 | small, medium, large, xl | Text Generation |\n| GPT-Neo | 125M, 1.3B, 2.7B | Text Generation |\n| GPT-J | 6B | Text Generation |\n| GPT-NeoX | various | Text Generation |\n| LLaMA | 7B, 13B, 30B, 65B, 70B | Text Generation |\n| Mistral | 7B | Text Generation |\n| Gemma | 2B, 7B | Text Generation |\n| Qwen | 1.8B, 7B, 14B | Text Generation |\n| Phi-3 | mini, small, medium | Text Generation |\n| Falcon | 7B, 40B | Text Generation |\n| StableLM | 1.6B–12B | Text Generation |\n\n| Model | Variants | Tasks |\n|---|---|---|\n| T5 | small, base, large, 3B, 11B | Text-to-Text Generation |\n\n| Model | Variants | Tasks |\n|---|---|---|\n| ViT | tiny, small, base, large | Image Classification |\n| CLIP | base, large | Text-Image Matching |\n| BLIP-2 | various | Vision-Language |\n| LLaVA | various | Visual Instruction Tuning |\n| DALL-E | various | Text-to-Image Generation |\n| Flamingo | various | Visual Language Model |\n\n| Model | Complexity | Tasks |\n|---|---|---|\n| Mamba | O(N) | Long-context Generation |\n| RWKV | O(N) | Recurrent Language Modeling |\n| S4 | O(N log N) | Long-range Sequence Modeling |\n\nTrustformeRS includes state-of-the-art optimizations not mentioned in typical documentation:\n\n**FlashAttention & FlashAttention-2**: O(N) memory complexity for attention** PagedAttention**: Efficient KV cache management for long sequences** INT8/INT4 Quantization**: GPTQ and AWQ quantization methods** Mixed Precision**: FP16/BF16 training and inference** ZeRO Optimization**: All 3 stages for distributed training** SIMD Operations**: Leveraging SciRS2 for vectorized computations** Tensor Parallelism**: Split large models across multiple GPUs** Gradient Checkpointing**: Trade compute for memory efficiency\n\nTrustformeRS supports multiple deployment targets:\n\n-\n**WebAssembly**: Browser deployment (trustformers-wasm, Stable)- WebGPU acceleration support\n- JavaScript/TypeScript bindings\n- React/Vue component-ready\n\n-\n**Server**: Production-ready API serving (trustformers-serve, Stable)- REST, gRPC, and GraphQL endpoints\n- Dynamic batching with Kubernetes deployment manifests\n- Docker containers and auto-scaling support\n\n-\n**Training**: Full training infrastructure (trustformers-training, Stable)- RLHF and DPO training support\n- Distributed training with ZeRO optimization\n- Mixed precision (FP16/BF16)\n\n-\n**Mobile**: Native mobile deployment (trustformers-mobile, Alpha)- iOS framework with Core ML and Metal acceleration\n- Android library with NNAPI and Vulkan support\n- React Native, Flutter, and Unity integrations\n\n-\n**Edge**: Export to optimized formats- ONNX export/import\n- GGUF format support\n- Quantized models (INT8/INT4, GPTQ, AWQ) for embedded devices\n\n```\nuse trustformers_core::{Model, Layer, Config};\n\nstruct MyTransformer {\n    embeddings: Embedding,\n    encoder: TransformerEncoder,\n    pooler: Pooler,\n}\n\nimpl Model for MyTransformer {\n    type Config = MyConfig;\n    type Input = TokenizedInput;\n    type Output = ModelOutput;\n    \n    fn forward(&self, input: Self::Input) -> Result<Self::Output> {\n        let hidden_states = self.embeddings.forward(input.input_ids)?;\n        let encoded = self.encoder.forward(hidden_states)?;\n        let pooled = self.pooler.forward(&encoded)?;\n        \n        Ok(ModelOutput { hidden_states: encoded, pooled_output: pooled })\n    }\n}\njs\nuse trustformers::GpuContext;\n\nlet gpu = GpuContext::new(0)?; // Use GPU 0\nlet model = model.to_gpu(&gpu)?;\n\n// Inference now runs on GPU\nlet outputs = model.forward(&inputs)?;\n# Build for WASM\ncargo build --target wasm32-unknown-unknown --features wasm\n\n# Use in JavaScript\nimport init, { BertModel, Tokenizer } from './trustformers_wasm.js';\n\nawait init();\nconst tokenizer = Tokenizer.from_pretrained(\"bert-base-uncased\");\nconst model = BertModel.from_pretrained(\"bert-base-uncased\");\n```\n\nTrustformeRS maintains API similarity with Hugging Face Transformers for easy migration:\n\n| Python (Transformers) | Rust (TrustformeRS) |\n|\n\n``` python\nfrom transformers import (\n    AutoModel, \n    AutoTokenizer\n)\n\ntokenizer = AutoTokenizer.from_pretrained(\n    \"bert-base-uncased\"\n)\nmodel = AutoModel.from_pretrained(\n    \"bert-base-uncased\"\n)\n\ninputs = tokenizer(\n    \"Hello world!\", \n    return_tensors=\"pt\"\n)\noutputs = model(**inputs)\n```\n\n |\n\n``` js\nuse trustformers::{\n    AutoModel, \n    AutoTokenizer\n};\n\nlet tokenizer = AutoTokenizer::from_pretrained(\n    \"bert-base-uncased\"\n)?;\nlet model = AutoModel::from_pretrained(\n    \"bert-base-uncased\"\n)?;\n\nlet inputs = tokenizer.encode(\n    \"Hello world!\", \n    None\n)?;\nlet outputs = model.forward(&inputs)?;\n```\n\n |\n\n-\n**49+ transformer architectures**(BERT, RoBERTa, ALBERT, DistilBERT, ELECTRA, DeBERTa, GPT-2, GPT-Neo, GPT-J, GPT-NeoX, LLaMA, Mistral, Gemma, Qwen, Phi-3, Falcon, StableLM, T5, ViT, CLIP, BLIP-2, LLaVA, DALL-E, Flamingo, Mamba, RWKV, S4, Falcon2, Gemma2, Granite, Hyena, InternLM2, Jamba, Jamba2, Linformer, LLaMA3.2, Mamba2, Nemotron, Performer, Phi4, Qwen2.5, RetNet, SD3, StarCoder2, Whisper, xLSTM, Yi) -\n**All major NLP pipelines** fully implemented (text-generation, classification, QA, NER, fill-mask, summarization, translation) -\n**Complete training infrastructure** with distributed training, ZeRO optimization, mixed precision, RLHF and DPO support -\n**Mobile deployment** with iOS (Core ML, Metal) and Android (NNAPI, Vulkan) support -\n**WebAssembly deployment** with WebGPU acceleration -\n**REST/gRPC/GraphQL APIs** with dynamic batching, Kubernetes deployment, and autoscaling -\n**Safety filtering pipeline** with configurable content moderation -\n**Advanced optimizations**: FlashAttention, PagedAttention, quantization (INT8/INT4/GPTQ/AWQ) -\n**Hardware acceleration**: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU support -\n**AutoModel/AutoTokenizer** system with HuggingFace Hub integration -\n**Comprehensive test suite**: 5,358 tests with 100% pass rate -\n**Debugging tools**: Profilers, visualizers, interactive debugging, TensorBoard integration -\n**100% Pure Rust**(COOLJAPAN Policy) - ~1,408,134 SLoC across 10 crates\n\n-\n**MPSGraph acceleration**: Awaiting scirs2-core 0.3.0 for 50-200x Metal performance improvement -\n**More quantization methods**: Enhanced GGUF format, AutoGPTQ improvements -\n**Additional vision transformer variants**: ViT-Huge, DeiT, Swin\n\n-\n**Custom CUDA kernels**: Further GPU optimization beyond current FlashAttention -\n**Streaming inference**: Real-time token streaming for all generation pipelines\n\n-\n**Comprehensive guides**: Model implementation, deployment, optimization tuning -\n**Cookbook**: Common patterns and best practices\n\nWe welcome contributions! See our [Contributing Guide](/cool-japan/trustformers/blob/master/CONTRIBUTING.md) for details.\n\n- Create a new module in\n`trustformers-models/src/`\n\n- Implement the\n`Config`\n\n,`Model`\n\n, and task-specific heads - Add tests comparing outputs with Hugging Face\n- Submit a PR with benchmarks\n\n- Profile with\n`cargo-flamegraph`\n\n- Benchmark with\n`criterion`\n\n- Consider SIMD optimizations for hot paths\n- Ensure thread-safety for concurrent use\n\nRun benchmarks with:\n\n```\ncargo bench --all-features\n```\n\nView detailed results in `target/criterion/report/index.html`\n\n- No unsafe code in public APIs (only in carefully reviewed hot paths)\n- All models are\n`Send + Sync`\n\nfor safe concurrent use - Fuzzing tests for tokenizers\n- Memory usage bounds for OOM prevention\n\n- Inspired by\n[Hugging Face Transformers](https://github.com/huggingface/transformers) - Built on\n[SciRS2](https://github.com/scirs)for scientific computing - Tokenizers from\n[Hugging Face Tokenizers](https://github.com/huggingface/tokenizers) - Community contributions and feedback\n\nTrustFormers is developed and maintained by **COOLJAPAN OU (Team Kitasan)**.\n\nIf you find TrustFormers useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.\n\n[https://github.com/sponsors/cool-japan](https://github.com/sponsors/cool-japan)\n\nYour sponsorship helps us:\n\n- Maintain and improve the COOLJAPAN ecosystem\n- Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust\n- Provide long-term support and security updates\n\nLicensed under Apache License, Version 2.0 ([LICENSE](/cool-japan/trustformers/blob/master/LICENSE)).\n\nBuilt with 🦀 and ❤️ by COOLJAPAN OU (Team KitaSan)", "url": "https://wpnews.pro/news/rust-port-of-transformers-1m-lines-of-code", "canonical_source": "https://github.com/cool-japan/trustformers/tree/master", "published_at": "2026-06-18 01:14:24+00:00", "updated_at": "2026-06-18 01:21:56.323285+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "ai-tools", "developer-tools", "ai-infrastructure"], "entities": ["Hugging Face", "TrustformeRS", "BERT", "GPT-2", "LLaMA", "T5", "ViT", "NVIDIA"], "alternates": {"html": "https://wpnews.pro/news/rust-port-of-transformers-1m-lines-of-code", "markdown": "https://wpnews.pro/news/rust-port-of-transformers-1m-lines-of-code.md", "text": "https://wpnews.pro/news/rust-port-of-transformers-1m-lines-of-code.txt", "jsonld": "https://wpnews.pro/news/rust-port-of-transformers-1m-lines-of-code.jsonld"}}