A high-performance, memory-safe Rust implementation of Hugging Face Transformers. TrustformeRS brings the power of transformer models to the Rust ecosystem with zero-cost abstractions, fearless concurrency, and deployment flexibility from edge to cloud.
Project Status: TrustformeRS 0.1.1 was released on 2026-04-25. This release delivers 49+ transformer architectures, 5,358 tests with 100% pass rate, ~1,408,134 lines of 100% Pure Rust, and full multi-platform deployment (WebAssembly, server REST/gRPC/GraphQL, mobile iOS/Android, RLHF/DPO training). Multi-backend GPU support: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU.
ποΈ Performance: Leverages Rust's zero-cost abstractions, SIMD optimizations, and efficient memory managementπ Safety: Memory-safe by design with Rust's ownership model - no more segfaults or memory leaksπ¦ Portability: Deploy anywhere from WebAssembly to embedded devices to GPU clustersπ§ Control: Explicit resource management following SciRS2's Core Usage Policyπ€ Compatibility: Loads Hugging Face model formats directly
| Model | Task | TrustformeRS | HF Transformers | Speedup |
|---|---|---|---|---|
| BERT-base | Inference (CPU) | 23ms | 31ms | 1.35x |
| BERT-base | Batch=32 (CPU) | 412ms | 687ms | 1.67x |
| GPT-2 | Generation (CPU) | 89ms | 142ms | 1.59x |
| LLaMA-7B | Generation (GPU) | 12ms/token | 18ms/token | 1.50x |
| T5-base | Translation | 156ms | 234ms | 1.50x |
| ViT-base | Image Classification | 15ms | 22ms | 1.47x |
Benchmarks on Intel i9-12900K (CPU) and NVIDIA RTX 4090 (GPU)
TrustformeRS follows a modular workspace structure inspired by Hugging Face Transformers:
trustformers/
βββ trustformers-core/ # Core traits and tensor abstractions (204,130 SLoC, Stable)
βββ trustformers-models/ # 49+ model implementations (196,463 SLoC, Alpha)
βββ trustformers-tokenizers/# BPE, WordPiece, SentencePiece ( 51,211 SLoC, Stable)
βββ trustformers-optim/ # 20+ optimizers and LR schedulers ( 71,429 SLoC, Stable)
βββ trustformers-training/ # Distributed training, RLHF/DPO ( 89,413 SLoC, Stable)
βββ trustformers-serve/ # REST/gRPC/GraphQL serving (361,251 SLoC, Stable)
βββ trustformers-wasm/ # WebAssembly + WebGPU deployment ( 55,504 SLoC, Stable)
βββ trustformers-mobile/ # iOS/Android deployment (143,001 SLoC, Alpha)
βββ trustformers-debug/ # Profilers, visualizers, TensorBoard (101,448 SLoC, Alpha)
βββ trustformers/ # High-level integration crate (134,295 SLoC, Alpha)
Total: ~1.4M+ SLoC, 100% Pure Rust (COOLJAPAN Policy)
Trait-based abstractions: Models, layers, and tokenizers implement common traits for composability** Feature-gated backends**: Choose between CPU, GPU, or WebAssembly targets** Zero-copy model : Memory-mapped weights with SafeTensors format Explicit parallelism**: You control thread and GPU usage, not the library
[dependencies]
trustformers = "0.1.1"
use trustformers::prelude::*;
use trustformers::{AutoModel, AutoTokenizer};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load model and tokenizer
let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
let model = AutoModel::from_pretrained("bert-base-uncased")?;
// Tokenize input
let inputs = tokenizer.encode("Hello, Rust world!", None)?;
// Run inference
let outputs = model.forward(&inputs)?;
println!("Hidden states shape: {:?}", outputs.last_hidden_state.shape());
Ok(())
}
use trustformers::pipeline;
// All pipelines are fully implemented and ready to use!
let classifier = pipeline("sentiment-analysis")?;
let result = classifier("I love writing Rust code!")?;
// Output: [{ label: "POSITIVE", score: 0.999 }]
// Also available:
// - text-generation
// - token-classification (NER)
// - question-answering
// - fill-mask
// - summarization
// - translation
| Model | Variants | Tasks |
|---|---|---|
| BERT | base, large | Masked LM, Classification, Token Classification, QA |
| RoBERTa | base, large | Same as BERT |
| DistilBERT | base | Same as BERT (faster) |
| ALBERT | base, large | Same as BERT (parameter sharing) |
| ELECTRA | base, large | Discriminative pretraining |
| DeBERTa | base, large | Disentangled attention |
| Model | Variants | Tasks |
|---|---|---|
| GPT-2 | small, medium, large, xl | Text Generation |
| GPT-Neo | 125M, 1.3B, 2.7B | Text Generation |
| GPT-J | 6B | Text Generation |
| GPT-NeoX | various | Text Generation |
| LLaMA | 7B, 13B, 30B, 65B, 70B | Text Generation |
| Mistral | 7B | Text Generation |
| Gemma | 2B, 7B | Text Generation |
| Qwen | 1.8B, 7B, 14B | Text Generation |
| Phi-3 | mini, small, medium | Text Generation |
| Falcon | 7B, 40B | Text Generation |
| StableLM | 1.6Bβ12B | Text Generation |
| Model | Variants | Tasks |
|---|---|---|
| T5 | small, base, large, 3B, 11B | Text-to-Text Generation |
| Model | Variants | Tasks |
|---|---|---|
| ViT | tiny, small, base, large | Image Classification |
| CLIP | base, large | Text-Image Matching |
| BLIP-2 | various | Vision-Language |
| LLaVA | various | Visual Instruction Tuning |
| DALL-E | various | Text-to-Image Generation |
| Flamingo | various | Visual Language Model |
| Model | Complexity | Tasks |
|---|---|---|
| Mamba | O(N) | Long-context Generation |
| RWKV | O(N) | Recurrent Language Modeling |
| S4 | O(N log N) | Long-range Sequence Modeling |
TrustformeRS includes state-of-the-art optimizations not mentioned in typical documentation:
FlashAttention & FlashAttention-2: O(N) memory complexity for attention** PagedAttention**: Efficient KV cache management for long sequences** INT8/INT4 Quantization**: GPTQ and AWQ quantization methods** Mixed Precision**: FP16/BF16 training and inference** ZeRO Optimization**: All 3 stages for distributed training** SIMD Operations**: Leveraging SciRS2 for vectorized computations** Tensor Parallelism**: Split large models across multiple GPUs** Gradient Checkpointing**: Trade compute for memory efficiency
TrustformeRS supports multiple deployment targets:
WebAssembly: Browser deployment (trustformers-wasm, Stable)- WebGPU acceleration support
-
JavaScript/TypeScript bindings
-
React/Vue component-ready
Server: Production-ready API serving (trustformers-serve, Stable)- REST, gRPC, and GraphQL endpoints
-
Dynamic batching with Kubernetes deployment manifests
-
Docker containers and auto-scaling support
Training: Full training infrastructure (trustformers-training, Stable)- RLHF and DPO training support
-
Distributed training with ZeRO optimization
-
Mixed precision (FP16/BF16)
Mobile: Native mobile deployment (trustformers-mobile, Alpha)- iOS framework with Core ML and Metal acceleration
-
Android library with NNAPI and Vulkan support
-
React Native, Flutter, and Unity integrations
Edge: Export to optimized formats- ONNX export/import
- GGUF format support
- Quantized models (INT8/INT4, GPTQ, AWQ) for embedded devices
use trustformers_core::{Model, Layer, Config};
struct MyTransformer {
embeddings: Embedding,
encoder: TransformerEncoder,
pooler: Pooler,
}
impl Model for MyTransformer {
type Config = MyConfig;
type Input = TokenizedInput;
type Output = ModelOutput;
fn forward(&self, input: Self::Input) -> Result<Self::Output> {
let hidden_states = self.embeddings.forward(input.input_ids)?;
let encoded = self.encoder.forward(hidden_states)?;
let pooled = self.pooler.forward(&encoded)?;
Ok(ModelOutput { hidden_states: encoded, pooled_output: pooled })
}
}
js
use trustformers::GpuContext;
let gpu = GpuContext::new(0)?; // Use GPU 0
let model = model.to_gpu(&gpu)?;
// Inference now runs on GPU
let outputs = model.forward(&inputs)?;
cargo build --target wasm32-unknown-unknown --features wasm
import init, { BertModel, Tokenizer } from './trustformers_wasm.js';
await init();
const tokenizer = Tokenizer.from_pretrained("bert-base-uncased");
const model = BertModel.from_pretrained("bert-base-uncased");
TrustformeRS maintains API similarity with Hugging Face Transformers for easy migration:
| Python (Transformers) | Rust (TrustformeRS) | |
from transformers import (
AutoModel,
AutoTokenizer
)
tokenizer = AutoTokenizer.from_pretrained(
"bert-base-uncased"
)
model = AutoModel.from_pretrained(
"bert-base-uncased"
)
inputs = tokenizer(
"Hello world!",
return_tensors="pt"
)
outputs = model(**inputs)
|
use trustformers::{
AutoModel,
AutoTokenizer
};
let tokenizer = AutoTokenizer::from_pretrained(
"bert-base-uncased"
)?;
let model = AutoModel::from_pretrained(
"bert-base-uncased"
)?;
let inputs = tokenizer.encode(
"Hello world!",
None
)?;
let outputs = model.forward(&inputs)?;
|
49+ transformer architectures(BERT, RoBERTa, ALBERT, DistilBERT, ELECTRA, DeBERTa, GPT-2, GPT-Neo, GPT-J, GPT-NeoX, LLaMA, Mistral, Gemma, Qwen, Phi-3, Falcon, StableLM, T5, ViT, CLIP, BLIP-2, LLaVA, DALL-E, Flamingo, Mamba, RWKV, S4, Falcon2, Gemma2, Granite, Hyena, InternLM2, Jamba, Jamba2, Linformer, LLaMA3.2, Mamba2, Nemotron, Performer, Phi4, Qwen2.5, RetNet, SD3, StarCoder2, Whisper, xLSTM, Yi) - All major NLP pipelines fully implemented (text-generation, classification, QA, NER, fill-mask, summarization, translation) - Complete training infrastructure with distributed training, ZeRO optimization, mixed precision, RLHF and DPO support - Mobile deployment with iOS (Core ML, Metal) and Android (NNAPI, Vulkan) support - WebAssembly deployment with WebGPU acceleration - REST/gRPC/GraphQL APIs with dynamic batching, Kubernetes deployment, and autoscaling - Safety filtering pipeline with configurable content moderation - Advanced optimizations: FlashAttention, PagedAttention, quantization (INT8/INT4/GPTQ/AWQ) - Hardware acceleration: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU support - AutoModel/AutoTokenizer system with HuggingFace Hub integration - Comprehensive test suite: 5,358 tests with 100% pass rate - Debugging tools: Profilers, visualizers, interactive debugging, TensorBoard integration - 100% Pure Rust(COOLJAPAN Policy) - ~1,408,134 SLoC across 10 crates
MPSGraph acceleration: Awaiting scirs2-core 0.3.0 for 50-200x Metal performance improvement - More quantization methods: Enhanced GGUF format, AutoGPTQ improvements - Additional vision transformer variants: ViT-Huge, DeiT, Swin
Custom CUDA kernels: Further GPU optimization beyond current FlashAttention - Streaming inference: Real-time token streaming for all generation pipelines
Comprehensive guides: Model implementation, deployment, optimization tuning - Cookbook: Common patterns and best practices
We welcome contributions! See our Contributing Guide for details.
-
Create a new module in
trustformers-models/src/ -
Implement the
Config
,Model
, and task-specific heads - Add tests comparing outputs with Hugging Face
-
Submit a PR with benchmarks
-
Profile with
cargo-flamegraph -
Benchmark with
criterion -
Consider SIMD optimizations for hot paths
-
Ensure thread-safety for concurrent use
Run benchmarks with:
cargo bench --all-features
View detailed results in target/criterion/report/index.html
- No unsafe code in public APIs (only in carefully reviewed hot paths)
- All models are
Send + Sync
for safe concurrent use - Fuzzing tests for tokenizers
-
Memory usage bounds for OOM prevention
-
Inspired by Hugging Face Transformers - Built on SciRS2for scientific computing - Tokenizers from Hugging Face Tokenizers - Community contributions and feedback
TrustFormers is developed and maintained by COOLJAPAN OU (Team Kitasan).
If you find TrustFormers useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.
https://github.com/sponsors/cool-japan
Your sponsorship helps us:
- Maintain and improve the COOLJAPAN ecosystem
- Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust
- Provide long-term support and security updates
Licensed under Apache License, Version 2.0 (LICENSE).
Built with π¦ and β€οΈ by COOLJAPAN OU (Team KitaSan)