cd /news/large-language-models/rust-port-of-transformers-1m-lines-o… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-31928] src=github.com β†— pub= topic=large-language-models verified=true sentiment=↑ positive

Rust port of transformers (1M lines of code)

TrustformeRS 0.1.1, a pure Rust port of Hugging Face Transformers with over 1.4 million lines of code, was released on April 25, 2026, delivering 49+ transformer architectures and up to 1.67x speedup over the Python original. The memory-safe implementation supports multi-backend GPU acceleration and deployment across WebAssembly, mobile, and server platforms.

read8 min views3 publishedJun 18, 2026

A high-performance, memory-safe Rust implementation of Hugging Face Transformers. TrustformeRS brings the power of transformer models to the Rust ecosystem with zero-cost abstractions, fearless concurrency, and deployment flexibility from edge to cloud.

Project Status: TrustformeRS 0.1.1 was released on 2026-04-25. This release delivers 49+ transformer architectures, 5,358 tests with 100% pass rate, ~1,408,134 lines of 100% Pure Rust, and full multi-platform deployment (WebAssembly, server REST/gRPC/GraphQL, mobile iOS/Android, RLHF/DPO training). Multi-backend GPU support: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU.

🏎️ Performance: Leverages Rust's zero-cost abstractions, SIMD optimizations, and efficient memory managementπŸ”’ Safety: Memory-safe by design with Rust's ownership model - no more segfaults or memory leaksπŸ“¦ Portability: Deploy anywhere from WebAssembly to embedded devices to GPU clustersπŸ”§ Control: Explicit resource management following SciRS2's Core Usage Policy🀝 Compatibility: Loads Hugging Face model formats directly

Model Task TrustformeRS HF Transformers Speedup
BERT-base Inference (CPU) 23ms 31ms 1.35x
BERT-base Batch=32 (CPU) 412ms 687ms 1.67x
GPT-2 Generation (CPU) 89ms 142ms 1.59x
LLaMA-7B Generation (GPU) 12ms/token 18ms/token 1.50x
T5-base Translation 156ms 234ms 1.50x
ViT-base Image Classification 15ms 22ms 1.47x

Benchmarks on Intel i9-12900K (CPU) and NVIDIA RTX 4090 (GPU)

TrustformeRS follows a modular workspace structure inspired by Hugging Face Transformers:

trustformers/
β”œβ”€β”€ trustformers-core/      # Core traits and tensor abstractions  (204,130 SLoC, Stable)
β”œβ”€β”€ trustformers-models/    # 49+ model implementations           (196,463 SLoC, Alpha)
β”œβ”€β”€ trustformers-tokenizers/# BPE, WordPiece, SentencePiece       ( 51,211 SLoC, Stable)
β”œβ”€β”€ trustformers-optim/     # 20+ optimizers and LR schedulers    ( 71,429 SLoC, Stable)
β”œβ”€β”€ trustformers-training/  # Distributed training, RLHF/DPO      ( 89,413 SLoC, Stable)
β”œβ”€β”€ trustformers-serve/     # REST/gRPC/GraphQL serving           (361,251 SLoC, Stable)
β”œβ”€β”€ trustformers-wasm/      # WebAssembly + WebGPU deployment     ( 55,504 SLoC, Stable)
β”œβ”€β”€ trustformers-mobile/    # iOS/Android deployment              (143,001 SLoC, Alpha)
β”œβ”€β”€ trustformers-debug/     # Profilers, visualizers, TensorBoard (101,448 SLoC, Alpha)
└── trustformers/           # High-level integration crate        (134,295 SLoC, Alpha)

Total: ~1.4M+ SLoC, 100% Pure Rust (COOLJAPAN Policy)

Trait-based abstractions: Models, layers, and tokenizers implement common traits for composability** Feature-gated backends**: Choose between CPU, GPU, or WebAssembly targets** Zero-copy model : Memory-mapped weights with SafeTensors format Explicit parallelism**: You control thread and GPU usage, not the library

[dependencies]
trustformers = "0.1.1"
use trustformers::prelude::*;
use trustformers::{AutoModel, AutoTokenizer};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load model and tokenizer
    let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
    let model = AutoModel::from_pretrained("bert-base-uncased")?;
    
    // Tokenize input
    let inputs = tokenizer.encode("Hello, Rust world!", None)?;
    
    // Run inference
    let outputs = model.forward(&inputs)?;
    
    println!("Hidden states shape: {:?}", outputs.last_hidden_state.shape());
    Ok(())
}
use trustformers::pipeline;

// All pipelines are fully implemented and ready to use!
let classifier = pipeline("sentiment-analysis")?;
let result = classifier("I love writing Rust code!")?;
// Output: [{ label: "POSITIVE", score: 0.999 }]

// Also available:
// - text-generation
// - token-classification (NER)
// - question-answering
// - fill-mask
// - summarization
// - translation
Model Variants Tasks
BERT base, large Masked LM, Classification, Token Classification, QA
RoBERTa base, large Same as BERT
DistilBERT base Same as BERT (faster)
ALBERT base, large Same as BERT (parameter sharing)
ELECTRA base, large Discriminative pretraining
DeBERTa base, large Disentangled attention
Model Variants Tasks
GPT-2 small, medium, large, xl Text Generation
GPT-Neo 125M, 1.3B, 2.7B Text Generation
GPT-J 6B Text Generation
GPT-NeoX various Text Generation
LLaMA 7B, 13B, 30B, 65B, 70B Text Generation
Mistral 7B Text Generation
Gemma 2B, 7B Text Generation
Qwen 1.8B, 7B, 14B Text Generation
Phi-3 mini, small, medium Text Generation
Falcon 7B, 40B Text Generation
StableLM 1.6B–12B Text Generation
Model Variants Tasks
T5 small, base, large, 3B, 11B Text-to-Text Generation
Model Variants Tasks
ViT tiny, small, base, large Image Classification
CLIP base, large Text-Image Matching
BLIP-2 various Vision-Language
LLaVA various Visual Instruction Tuning
DALL-E various Text-to-Image Generation
Flamingo various Visual Language Model
Model Complexity Tasks
Mamba O(N) Long-context Generation
RWKV O(N) Recurrent Language Modeling
S4 O(N log N) Long-range Sequence Modeling

TrustformeRS includes state-of-the-art optimizations not mentioned in typical documentation:

FlashAttention & FlashAttention-2: O(N) memory complexity for attention** PagedAttention**: Efficient KV cache management for long sequences** INT8/INT4 Quantization**: GPTQ and AWQ quantization methods** Mixed Precision**: FP16/BF16 training and inference** ZeRO Optimization**: All 3 stages for distributed training** SIMD Operations**: Leveraging SciRS2 for vectorized computations** Tensor Parallelism**: Split large models across multiple GPUs** Gradient Checkpointing**: Trade compute for memory efficiency

TrustformeRS supports multiple deployment targets:

WebAssembly: Browser deployment (trustformers-wasm, Stable)- WebGPU acceleration support

  • JavaScript/TypeScript bindings

  • React/Vue component-ready

Server: Production-ready API serving (trustformers-serve, Stable)- REST, gRPC, and GraphQL endpoints

  • Dynamic batching with Kubernetes deployment manifests

  • Docker containers and auto-scaling support

Training: Full training infrastructure (trustformers-training, Stable)- RLHF and DPO training support

  • Distributed training with ZeRO optimization

  • Mixed precision (FP16/BF16)

Mobile: Native mobile deployment (trustformers-mobile, Alpha)- iOS framework with Core ML and Metal acceleration

  • Android library with NNAPI and Vulkan support

  • React Native, Flutter, and Unity integrations

Edge: Export to optimized formats- ONNX export/import

  • GGUF format support
  • Quantized models (INT8/INT4, GPTQ, AWQ) for embedded devices
use trustformers_core::{Model, Layer, Config};

struct MyTransformer {
    embeddings: Embedding,
    encoder: TransformerEncoder,
    pooler: Pooler,
}

impl Model for MyTransformer {
    type Config = MyConfig;
    type Input = TokenizedInput;
    type Output = ModelOutput;
    
    fn forward(&self, input: Self::Input) -> Result<Self::Output> {
        let hidden_states = self.embeddings.forward(input.input_ids)?;
        let encoded = self.encoder.forward(hidden_states)?;
        let pooled = self.pooler.forward(&encoded)?;
        
        Ok(ModelOutput { hidden_states: encoded, pooled_output: pooled })
    }
}
js
use trustformers::GpuContext;

let gpu = GpuContext::new(0)?; // Use GPU 0
let model = model.to_gpu(&gpu)?;

// Inference now runs on GPU
let outputs = model.forward(&inputs)?;
cargo build --target wasm32-unknown-unknown --features wasm

import init, { BertModel, Tokenizer } from './trustformers_wasm.js';

await init();
const tokenizer = Tokenizer.from_pretrained("bert-base-uncased");
const model = BertModel.from_pretrained("bert-base-uncased");

TrustformeRS maintains API similarity with Hugging Face Transformers for easy migration:

| Python (Transformers) | Rust (TrustformeRS) | |

from transformers import (
    AutoModel, 
    AutoTokenizer
)

tokenizer = AutoTokenizer.from_pretrained(
    "bert-base-uncased"
)
model = AutoModel.from_pretrained(
    "bert-base-uncased"
)

inputs = tokenizer(
    "Hello world!", 
    return_tensors="pt"
)
outputs = model(**inputs)

|

use trustformers::{
    AutoModel, 
    AutoTokenizer
};

let tokenizer = AutoTokenizer::from_pretrained(
    "bert-base-uncased"
)?;
let model = AutoModel::from_pretrained(
    "bert-base-uncased"
)?;

let inputs = tokenizer.encode(
    "Hello world!", 
    None
)?;
let outputs = model.forward(&inputs)?;

|

49+ transformer architectures(BERT, RoBERTa, ALBERT, DistilBERT, ELECTRA, DeBERTa, GPT-2, GPT-Neo, GPT-J, GPT-NeoX, LLaMA, Mistral, Gemma, Qwen, Phi-3, Falcon, StableLM, T5, ViT, CLIP, BLIP-2, LLaVA, DALL-E, Flamingo, Mamba, RWKV, S4, Falcon2, Gemma2, Granite, Hyena, InternLM2, Jamba, Jamba2, Linformer, LLaMA3.2, Mamba2, Nemotron, Performer, Phi4, Qwen2.5, RetNet, SD3, StarCoder2, Whisper, xLSTM, Yi) - All major NLP pipelines fully implemented (text-generation, classification, QA, NER, fill-mask, summarization, translation) - Complete training infrastructure with distributed training, ZeRO optimization, mixed precision, RLHF and DPO support - Mobile deployment with iOS (Core ML, Metal) and Android (NNAPI, Vulkan) support - WebAssembly deployment with WebGPU acceleration - REST/gRPC/GraphQL APIs with dynamic batching, Kubernetes deployment, and autoscaling - Safety filtering pipeline with configurable content moderation - Advanced optimizations: FlashAttention, PagedAttention, quantization (INT8/INT4/GPTQ/AWQ) - Hardware acceleration: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU support - AutoModel/AutoTokenizer system with HuggingFace Hub integration - Comprehensive test suite: 5,358 tests with 100% pass rate - Debugging tools: Profilers, visualizers, interactive debugging, TensorBoard integration - 100% Pure Rust(COOLJAPAN Policy) - ~1,408,134 SLoC across 10 crates

MPSGraph acceleration: Awaiting scirs2-core 0.3.0 for 50-200x Metal performance improvement - More quantization methods: Enhanced GGUF format, AutoGPTQ improvements - Additional vision transformer variants: ViT-Huge, DeiT, Swin

Custom CUDA kernels: Further GPU optimization beyond current FlashAttention - Streaming inference: Real-time token streaming for all generation pipelines

Comprehensive guides: Model implementation, deployment, optimization tuning - Cookbook: Common patterns and best practices

We welcome contributions! See our Contributing Guide for details.

  • Create a new module in trustformers-models/src/

  • Implement the Config

,Model

, and task-specific heads - Add tests comparing outputs with Hugging Face

  • Submit a PR with benchmarks

  • Profile with cargo-flamegraph

  • Benchmark with criterion

  • Consider SIMD optimizations for hot paths

  • Ensure thread-safety for concurrent use

Run benchmarks with:

cargo bench --all-features

View detailed results in target/criterion/report/index.html

  • No unsafe code in public APIs (only in carefully reviewed hot paths)
  • All models are Send + Sync

for safe concurrent use - Fuzzing tests for tokenizers

TrustFormers is developed and maintained by COOLJAPAN OU (Team Kitasan).

If you find TrustFormers useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.

https://github.com/sponsors/cool-japan

Your sponsorship helps us:

  • Maintain and improve the COOLJAPAN ecosystem
  • Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust
  • Provide long-term support and security updates

Licensed under Apache License, Version 2.0 (LICENSE).

Built with πŸ¦€ and ❀️ by COOLJAPAN OU (Team KitaSan)

── more in #large-language-models 4 stories Β· sorted by recency
── more on @hugging face 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/rust-port-of-transfo…] indexed:0 read:8min 2026-06-18 Β· β€”