Rust port of transformers (1M lines of code)

wpnews.pro

A high-performance, memory-safe Rust implementation of Hugging Face Transformers. TrustformeRS brings the power of transformer models to the Rust ecosystem with zero-cost abstractions, fearless concurrency, and deployment flexibility from edge to cloud.

Project Status: TrustformeRS 0.1.1 was released on 2026-04-25. This release delivers 49+ transformer architectures, 5,358 tests with 100% pass rate, ~1,408,134 lines of 100% Pure Rust, and full multi-platform deployment (WebAssembly, server REST/gRPC/GraphQL, mobile iOS/Android, RLHF/DPO training). Multi-backend GPU support: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU.

🏎️ Performance: Leverages Rust's zero-cost abstractions, SIMD optimizations, and efficient memory management🔒 Safety: Memory-safe by design with Rust's ownership model - no more segfaults or memory leaks📦 Portability: Deploy anywhere from WebAssembly to embedded devices to GPU clusters🔧 Control: Explicit resource management following SciRS2's Core Usage Policy🤝 Compatibility: Loads Hugging Face model formats directly

Model	Task	TrustformeRS	HF Transformers	Speedup
BERT-base	Inference (CPU)	23ms	31ms	1.35x
BERT-base	Batch=32 (CPU)	412ms	687ms	1.67x
GPT-2	Generation (CPU)	89ms	142ms	1.59x
LLaMA-7B	Generation (GPU)	12ms/token	18ms/token	1.50x
T5-base	Translation	156ms	234ms	1.50x
ViT-base	Image Classification	15ms	22ms	1.47x

Benchmarks on Intel i9-12900K (CPU) and NVIDIA RTX 4090 (GPU)

TrustformeRS follows a modular workspace structure inspired by Hugging Face Transformers:

trustformers/
├── trustformers-core/      # Core traits and tensor abstractions  (204,130 SLoC, Stable)
├── trustformers-models/    # 49+ model implementations           (196,463 SLoC, Alpha)
├── trustformers-tokenizers/# BPE, WordPiece, SentencePiece       ( 51,211 SLoC, Stable)
├── trustformers-optim/     # 20+ optimizers and LR schedulers    ( 71,429 SLoC, Stable)
├── trustformers-training/  # Distributed training, RLHF/DPO      ( 89,413 SLoC, Stable)
├── trustformers-serve/     # REST/gRPC/GraphQL serving           (361,251 SLoC, Stable)
├── trustformers-wasm/      # WebAssembly + WebGPU deployment     ( 55,504 SLoC, Stable)
├── trustformers-mobile/    # iOS/Android deployment              (143,001 SLoC, Alpha)
├── trustformers-debug/     # Profilers, visualizers, TensorBoard (101,448 SLoC, Alpha)
└── trustformers/           # High-level integration crate        (134,295 SLoC, Alpha)

Total: ~1.4M+ SLoC, 100% Pure Rust (COOLJAPAN Policy)

Trait-based abstractions: Models, layers, and tokenizers implement common traits for composability** Feature-gated backends**: Choose between CPU, GPU, or WebAssembly targets** Zero-copy model : Memory-mapped weights with SafeTensors format Explicit parallelism**: You control thread and GPU usage, not the library

[dependencies]
trustformers = "0.1.1"
use trustformers::prelude::*;
use trustformers::{AutoModel, AutoTokenizer};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load model and tokenizer
    let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
    let model = AutoModel::from_pretrained("bert-base-uncased")?;
    
    // Tokenize input
    let inputs = tokenizer.encode("Hello, Rust world!", None)?;
    
    // Run inference
    let outputs = model.forward(&inputs)?;
    
    println!("Hidden states shape: {:?}", outputs.last_hidden_state.shape());
    Ok(())
}
use trustformers::pipeline;

// All pipelines are fully implemented and ready to use!
let classifier = pipeline("sentiment-analysis")?;
let result = classifier("I love writing Rust code!")?;
// Output: [{ label: "POSITIVE", score: 0.999 }]

// Also available:
// - text-generation
// - token-classification (NER)
// - question-answering
// - fill-mask
// - summarization
// - translation

Model	Variants	Tasks
BERT	base, large	Masked LM, Classification, Token Classification, QA
RoBERTa	base, large	Same as BERT
DistilBERT	base	Same as BERT (faster)
ALBERT	base, large	Same as BERT (parameter sharing)
ELECTRA	base, large	Discriminative pretraining
DeBERTa	base, large	Disentangled attention

Model	Variants	Tasks
GPT-2	small, medium, large, xl	Text Generation
GPT-Neo	125M, 1.3B, 2.7B	Text Generation
GPT-J	6B	Text Generation
GPT-NeoX	various	Text Generation
LLaMA	7B, 13B, 30B, 65B, 70B	Text Generation
Mistral	7B	Text Generation
Gemma	2B, 7B	Text Generation
Qwen	1.8B, 7B, 14B	Text Generation
Phi-3	mini, small, medium	Text Generation
Falcon	7B, 40B	Text Generation
StableLM	1.6B–12B	Text Generation

Model	Variants	Tasks
T5	small, base, large, 3B, 11B	Text-to-Text Generation

Model	Variants	Tasks
ViT	tiny, small, base, large	Image Classification
CLIP	base, large	Text-Image Matching
BLIP-2	various	Vision-Language
LLaVA	various	Visual Instruction Tuning
DALL-E	various	Text-to-Image Generation
Flamingo	various	Visual Language Model

Model	Complexity	Tasks
Mamba	O(N)	Long-context Generation
RWKV	O(N)	Recurrent Language Modeling
S4	O(N log N)	Long-range Sequence Modeling

TrustformeRS includes state-of-the-art optimizations not mentioned in typical documentation:

FlashAttention & FlashAttention-2: O(N) memory complexity for attention** PagedAttention**: Efficient KV cache management for long sequences** INT8/INT4 Quantization**: GPTQ and AWQ quantization methods** Mixed Precision**: FP16/BF16 training and inference** ZeRO Optimization**: All 3 stages for distributed training** SIMD Operations**: Leveraging SciRS2 for vectorized computations** Tensor Parallelism**: Split large models across multiple GPUs** Gradient Checkpointing**: Trade compute for memory efficiency

TrustformeRS supports multiple deployment targets:

WebAssembly: Browser deployment (trustformers-wasm, Stable)- WebGPU acceleration support

JavaScript/TypeScript bindings
React/Vue component-ready

Server: Production-ready API serving (trustformers-serve, Stable)- REST, gRPC, and GraphQL endpoints

Dynamic batching with Kubernetes deployment manifests
Docker containers and auto-scaling support

Training: Full training infrastructure (trustformers-training, Stable)- RLHF and DPO training support

Distributed training with ZeRO optimization
Mixed precision (FP16/BF16)

Mobile: Native mobile deployment (trustformers-mobile, Alpha)- iOS framework with Core ML and Metal acceleration

Android library with NNAPI and Vulkan support
React Native, Flutter, and Unity integrations

Edge: Export to optimized formats- ONNX export/import

GGUF format support
Quantized models (INT8/INT4, GPTQ, AWQ) for embedded devices

use trustformers_core::{Model, Layer, Config};

struct MyTransformer {
    embeddings: Embedding,
    encoder: TransformerEncoder,
    pooler: Pooler,
}

impl Model for MyTransformer {
    type Config = MyConfig;
    type Input = TokenizedInput;
    type Output = ModelOutput;
    
    fn forward(&self, input: Self::Input) -> Result<Self::Output> {
        let hidden_states = self.embeddings.forward(input.input_ids)?;
        let encoded = self.encoder.forward(hidden_states)?;
        let pooled = self.pooler.forward(&encoded)?;
        
        Ok(ModelOutput { hidden_states: encoded, pooled_output: pooled })
    }
}
js
use trustformers::GpuContext;

let gpu = GpuContext::new(0)?; // Use GPU 0
let model = model.to_gpu(&gpu)?;

// Inference now runs on GPU
let outputs = model.forward(&inputs)?;
cargo build --target wasm32-unknown-unknown --features wasm

import init, { BertModel, Tokenizer } from './trustformers_wasm.js';

await init();
const tokenizer = Tokenizer.from_pretrained("bert-base-uncased");
const model = BertModel.from_pretrained("bert-base-uncased");

TrustformeRS maintains API similarity with Hugging Face Transformers for easy migration:

| Python (Transformers) | Rust (TrustformeRS) | |

from transformers import (
    AutoModel, 
    AutoTokenizer
)

tokenizer = AutoTokenizer.from_pretrained(
    "bert-base-uncased"
)
model = AutoModel.from_pretrained(
    "bert-base-uncased"
)

inputs = tokenizer(
    "Hello world!", 
    return_tensors="pt"
)
outputs = model(**inputs)

|

use trustformers::{
    AutoModel, 
    AutoTokenizer
};

let tokenizer = AutoTokenizer::from_pretrained(
    "bert-base-uncased"
)?;
let model = AutoModel::from_pretrained(
    "bert-base-uncased"
)?;

let inputs = tokenizer.encode(
    "Hello world!", 
    None
)?;
let outputs = model.forward(&inputs)?;

|

49+ transformer architectures(BERT, RoBERTa, ALBERT, DistilBERT, ELECTRA, DeBERTa, GPT-2, GPT-Neo, GPT-J, GPT-NeoX, LLaMA, Mistral, Gemma, Qwen, Phi-3, Falcon, StableLM, T5, ViT, CLIP, BLIP-2, LLaVA, DALL-E, Flamingo, Mamba, RWKV, S4, Falcon2, Gemma2, Granite, Hyena, InternLM2, Jamba, Jamba2, Linformer, LLaMA3.2, Mamba2, Nemotron, Performer, Phi4, Qwen2.5, RetNet, SD3, StarCoder2, Whisper, xLSTM, Yi) - All major NLP pipelines fully implemented (text-generation, classification, QA, NER, fill-mask, summarization, translation) - Complete training infrastructure with distributed training, ZeRO optimization, mixed precision, RLHF and DPO support - Mobile deployment with iOS (Core ML, Metal) and Android (NNAPI, Vulkan) support - WebAssembly deployment with WebGPU acceleration - REST/gRPC/GraphQL APIs with dynamic batching, Kubernetes deployment, and autoscaling - Safety filtering pipeline with configurable content moderation - Advanced optimizations: FlashAttention, PagedAttention, quantization (INT8/INT4/GPTQ/AWQ) - Hardware acceleration: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU support - AutoModel/AutoTokenizer system with HuggingFace Hub integration - Comprehensive test suite: 5,358 tests with 100% pass rate - Debugging tools: Profilers, visualizers, interactive debugging, TensorBoard integration - 100% Pure Rust(COOLJAPAN Policy) - ~1,408,134 SLoC across 10 crates

MPSGraph acceleration: Awaiting scirs2-core 0.3.0 for 50-200x Metal performance improvement - More quantization methods: Enhanced GGUF format, AutoGPTQ improvements - Additional vision transformer variants: ViT-Huge, DeiT, Swin

Custom CUDA kernels: Further GPU optimization beyond current FlashAttention - Streaming inference: Real-time token streaming for all generation pipelines

Comprehensive guides: Model implementation, deployment, optimization tuning - Cookbook: Common patterns and best practices

We welcome contributions! See our Contributing Guide for details.

Create a new module in trustformers-models/src/
Implement the Config

,Model

, and task-specific heads - Add tests comparing outputs with Hugging Face

Submit a PR with benchmarks
Profile with cargo-flamegraph
Benchmark with criterion
Consider SIMD optimizations for hot paths
Ensure thread-safety for concurrent use

Run benchmarks with:

cargo bench --all-features

View detailed results in target/criterion/report/index.html

No unsafe code in public APIs (only in carefully reviewed hot paths)
All models are Send + Sync

for safe concurrent use - Fuzzing tests for tokenizers

Memory usage bounds for OOM prevention
Inspired by Hugging Face Transformers - Built on SciRS2for scientific computing - Tokenizers from Hugging Face Tokenizers - Community contributions and feedback

TrustFormers is developed and maintained by COOLJAPAN OU (Team Kitasan).

If you find TrustFormers useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.

https://github.com/sponsors/cool-japan

Your sponsorship helps us:

Maintain and improve the COOLJAPAN ecosystem
Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust
Provide long-term support and security updates

Licensed under Apache License, Version 2.0 (LICENSE).

Built with 🦀 and ❤️ by COOLJAPAN OU (Team KitaSan)

source & further reading

github.com — original article

Rust port of transformers (1M lines of code)

Run your AI side-project on zahid.host