# Rust port of transformers (1M lines of code)

> Source: <https://github.com/cool-japan/trustformers/tree/master>
> Published: 2026-06-18 01:14:24+00:00

A high-performance, memory-safe Rust implementation of Hugging Face Transformers. TrustformeRS brings the power of transformer models to the Rust ecosystem with zero-cost abstractions, fearless concurrency, and deployment flexibility from edge to cloud.

Project Status: TrustformeRS 0.1.1 was released on 2026-04-25. This release delivers 49+ transformer architectures, 5,358 tests with 100% pass rate, ~1,408,134 lines of 100% Pure Rust, and full multi-platform deployment (WebAssembly, server REST/gRPC/GraphQL, mobile iOS/Android, RLHF/DPO training). Multi-backend GPU support: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU.

**🏎️ Performance**: Leverages Rust's zero-cost abstractions, SIMD optimizations, and efficient memory management**🔒 Safety**: Memory-safe by design with Rust's ownership model - no more segfaults or memory leaks**📦 Portability**: Deploy anywhere from WebAssembly to embedded devices to GPU clusters**🔧 Control**: Explicit resource management following SciRS2's Core Usage Policy**🤝 Compatibility**: Loads Hugging Face model formats directly

| Model | Task | TrustformeRS | HF Transformers | Speedup |
|---|---|---|---|---|
| BERT-base | Inference (CPU) | 23ms | 31ms | 1.35x |
| BERT-base | Batch=32 (CPU) | 412ms | 687ms | 1.67x |
| GPT-2 | Generation (CPU) | 89ms | 142ms | 1.59x |
| LLaMA-7B | Generation (GPU) | 12ms/token | 18ms/token | 1.50x |
| T5-base | Translation | 156ms | 234ms | 1.50x |
| ViT-base | Image Classification | 15ms | 22ms | 1.47x |

*Benchmarks on Intel i9-12900K (CPU) and NVIDIA RTX 4090 (GPU)*

TrustformeRS follows a modular workspace structure inspired by Hugging Face Transformers:

```
trustformers/
├── trustformers-core/      # Core traits and tensor abstractions  (204,130 SLoC, Stable)
├── trustformers-models/    # 49+ model implementations           (196,463 SLoC, Alpha)
├── trustformers-tokenizers/# BPE, WordPiece, SentencePiece       ( 51,211 SLoC, Stable)
├── trustformers-optim/     # 20+ optimizers and LR schedulers    ( 71,429 SLoC, Stable)
├── trustformers-training/  # Distributed training, RLHF/DPO      ( 89,413 SLoC, Stable)
├── trustformers-serve/     # REST/gRPC/GraphQL serving           (361,251 SLoC, Stable)
├── trustformers-wasm/      # WebAssembly + WebGPU deployment     ( 55,504 SLoC, Stable)
├── trustformers-mobile/    # iOS/Android deployment              (143,001 SLoC, Alpha)
├── trustformers-debug/     # Profilers, visualizers, TensorBoard (101,448 SLoC, Alpha)
└── trustformers/           # High-level integration crate        (134,295 SLoC, Alpha)
```

**Total**: ~1.4M+ SLoC, 100% Pure Rust (COOLJAPAN Policy)

**Trait-based abstractions**: Models, layers, and tokenizers implement common traits for composability** Feature-gated backends**: Choose between CPU, GPU, or WebAssembly targets** Zero-copy model loading**: Memory-mapped weights with SafeTensors format** Explicit parallelism**: You control thread and GPU usage, not the library

```
[dependencies]
trustformers = "0.1.1"
use trustformers::prelude::*;
use trustformers::{AutoModel, AutoTokenizer};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load model and tokenizer
    let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
    let model = AutoModel::from_pretrained("bert-base-uncased")?;
    
    // Tokenize input
    let inputs = tokenizer.encode("Hello, Rust world!", None)?;
    
    // Run inference
    let outputs = model.forward(&inputs)?;
    
    println!("Hidden states shape: {:?}", outputs.last_hidden_state.shape());
    Ok(())
}
use trustformers::pipeline;

// All pipelines are fully implemented and ready to use!
let classifier = pipeline("sentiment-analysis")?;
let result = classifier("I love writing Rust code!")?;
// Output: [{ label: "POSITIVE", score: 0.999 }]

// Also available:
// - text-generation
// - token-classification (NER)
// - question-answering
// - fill-mask
// - summarization
// - translation
```

| Model | Variants | Tasks |
|---|---|---|
| BERT | base, large | Masked LM, Classification, Token Classification, QA |
| RoBERTa | base, large | Same as BERT |
| DistilBERT | base | Same as BERT (faster) |
| ALBERT | base, large | Same as BERT (parameter sharing) |
| ELECTRA | base, large | Discriminative pretraining |
| DeBERTa | base, large | Disentangled attention |

| Model | Variants | Tasks |
|---|---|---|
| GPT-2 | small, medium, large, xl | Text Generation |
| GPT-Neo | 125M, 1.3B, 2.7B | Text Generation |
| GPT-J | 6B | Text Generation |
| GPT-NeoX | various | Text Generation |
| LLaMA | 7B, 13B, 30B, 65B, 70B | Text Generation |
| Mistral | 7B | Text Generation |
| Gemma | 2B, 7B | Text Generation |
| Qwen | 1.8B, 7B, 14B | Text Generation |
| Phi-3 | mini, small, medium | Text Generation |
| Falcon | 7B, 40B | Text Generation |
| StableLM | 1.6B–12B | Text Generation |

| Model | Variants | Tasks |
|---|---|---|
| T5 | small, base, large, 3B, 11B | Text-to-Text Generation |

| Model | Variants | Tasks |
|---|---|---|
| ViT | tiny, small, base, large | Image Classification |
| CLIP | base, large | Text-Image Matching |
| BLIP-2 | various | Vision-Language |
| LLaVA | various | Visual Instruction Tuning |
| DALL-E | various | Text-to-Image Generation |
| Flamingo | various | Visual Language Model |

| Model | Complexity | Tasks |
|---|---|---|
| Mamba | O(N) | Long-context Generation |
| RWKV | O(N) | Recurrent Language Modeling |
| S4 | O(N log N) | Long-range Sequence Modeling |

TrustformeRS includes state-of-the-art optimizations not mentioned in typical documentation:

**FlashAttention & FlashAttention-2**: O(N) memory complexity for attention** PagedAttention**: Efficient KV cache management for long sequences** INT8/INT4 Quantization**: GPTQ and AWQ quantization methods** Mixed Precision**: FP16/BF16 training and inference** ZeRO Optimization**: All 3 stages for distributed training** SIMD Operations**: Leveraging SciRS2 for vectorized computations** Tensor Parallelism**: Split large models across multiple GPUs** Gradient Checkpointing**: Trade compute for memory efficiency

TrustformeRS supports multiple deployment targets:

-
**WebAssembly**: Browser deployment (trustformers-wasm, Stable)- WebGPU acceleration support
- JavaScript/TypeScript bindings
- React/Vue component-ready

-
**Server**: Production-ready API serving (trustformers-serve, Stable)- REST, gRPC, and GraphQL endpoints
- Dynamic batching with Kubernetes deployment manifests
- Docker containers and auto-scaling support

-
**Training**: Full training infrastructure (trustformers-training, Stable)- RLHF and DPO training support
- Distributed training with ZeRO optimization
- Mixed precision (FP16/BF16)

-
**Mobile**: Native mobile deployment (trustformers-mobile, Alpha)- iOS framework with Core ML and Metal acceleration
- Android library with NNAPI and Vulkan support
- React Native, Flutter, and Unity integrations

-
**Edge**: Export to optimized formats- ONNX export/import
- GGUF format support
- Quantized models (INT8/INT4, GPTQ, AWQ) for embedded devices

```
use trustformers_core::{Model, Layer, Config};

struct MyTransformer {
    embeddings: Embedding,
    encoder: TransformerEncoder,
    pooler: Pooler,
}

impl Model for MyTransformer {
    type Config = MyConfig;
    type Input = TokenizedInput;
    type Output = ModelOutput;
    
    fn forward(&self, input: Self::Input) -> Result<Self::Output> {
        let hidden_states = self.embeddings.forward(input.input_ids)?;
        let encoded = self.encoder.forward(hidden_states)?;
        let pooled = self.pooler.forward(&encoded)?;
        
        Ok(ModelOutput { hidden_states: encoded, pooled_output: pooled })
    }
}
js
use trustformers::GpuContext;

let gpu = GpuContext::new(0)?; // Use GPU 0
let model = model.to_gpu(&gpu)?;

// Inference now runs on GPU
let outputs = model.forward(&inputs)?;
# Build for WASM
cargo build --target wasm32-unknown-unknown --features wasm

# Use in JavaScript
import init, { BertModel, Tokenizer } from './trustformers_wasm.js';

await init();
const tokenizer = Tokenizer.from_pretrained("bert-base-uncased");
const model = BertModel.from_pretrained("bert-base-uncased");
```

TrustformeRS maintains API similarity with Hugging Face Transformers for easy migration:

| Python (Transformers) | Rust (TrustformeRS) |
|

``` python
from transformers import (
    AutoModel, 
    AutoTokenizer
)

tokenizer = AutoTokenizer.from_pretrained(
    "bert-base-uncased"
)
model = AutoModel.from_pretrained(
    "bert-base-uncased"
)

inputs = tokenizer(
    "Hello world!", 
    return_tensors="pt"
)
outputs = model(**inputs)
```

 |

``` js
use trustformers::{
    AutoModel, 
    AutoTokenizer
};

let tokenizer = AutoTokenizer::from_pretrained(
    "bert-base-uncased"
)?;
let model = AutoModel::from_pretrained(
    "bert-base-uncased"
)?;

let inputs = tokenizer.encode(
    "Hello world!", 
    None
)?;
let outputs = model.forward(&inputs)?;
```

 |

-
**49+ transformer architectures**(BERT, RoBERTa, ALBERT, DistilBERT, ELECTRA, DeBERTa, GPT-2, GPT-Neo, GPT-J, GPT-NeoX, LLaMA, Mistral, Gemma, Qwen, Phi-3, Falcon, StableLM, T5, ViT, CLIP, BLIP-2, LLaVA, DALL-E, Flamingo, Mamba, RWKV, S4, Falcon2, Gemma2, Granite, Hyena, InternLM2, Jamba, Jamba2, Linformer, LLaMA3.2, Mamba2, Nemotron, Performer, Phi4, Qwen2.5, RetNet, SD3, StarCoder2, Whisper, xLSTM, Yi) -
**All major NLP pipelines** fully implemented (text-generation, classification, QA, NER, fill-mask, summarization, translation) -
**Complete training infrastructure** with distributed training, ZeRO optimization, mixed precision, RLHF and DPO support -
**Mobile deployment** with iOS (Core ML, Metal) and Android (NNAPI, Vulkan) support -
**WebAssembly deployment** with WebGPU acceleration -
**REST/gRPC/GraphQL APIs** with dynamic batching, Kubernetes deployment, and autoscaling -
**Safety filtering pipeline** with configurable content moderation -
**Advanced optimizations**: FlashAttention, PagedAttention, quantization (INT8/INT4/GPTQ/AWQ) -
**Hardware acceleration**: CUDA, Metal, ROCm, WebGPU, Vulkan, OpenCL, TPU support -
**AutoModel/AutoTokenizer** system with HuggingFace Hub integration -
**Comprehensive test suite**: 5,358 tests with 100% pass rate -
**Debugging tools**: Profilers, visualizers, interactive debugging, TensorBoard integration -
**100% Pure Rust**(COOLJAPAN Policy) - ~1,408,134 SLoC across 10 crates

-
**MPSGraph acceleration**: Awaiting scirs2-core 0.3.0 for 50-200x Metal performance improvement -
**More quantization methods**: Enhanced GGUF format, AutoGPTQ improvements -
**Additional vision transformer variants**: ViT-Huge, DeiT, Swin

-
**Custom CUDA kernels**: Further GPU optimization beyond current FlashAttention -
**Streaming inference**: Real-time token streaming for all generation pipelines

-
**Comprehensive guides**: Model implementation, deployment, optimization tuning -
**Cookbook**: Common patterns and best practices

We welcome contributions! See our [Contributing Guide](/cool-japan/trustformers/blob/master/CONTRIBUTING.md) for details.

- Create a new module in
`trustformers-models/src/`

- Implement the
`Config`

,`Model`

, and task-specific heads - Add tests comparing outputs with Hugging Face
- Submit a PR with benchmarks

- Profile with
`cargo-flamegraph`

- Benchmark with
`criterion`

- Consider SIMD optimizations for hot paths
- Ensure thread-safety for concurrent use

Run benchmarks with:

```
cargo bench --all-features
```

View detailed results in `target/criterion/report/index.html`

- No unsafe code in public APIs (only in carefully reviewed hot paths)
- All models are
`Send + Sync`

for safe concurrent use - Fuzzing tests for tokenizers
- Memory usage bounds for OOM prevention

- Inspired by
[Hugging Face Transformers](https://github.com/huggingface/transformers) - Built on
[SciRS2](https://github.com/scirs)for scientific computing - Tokenizers from
[Hugging Face Tokenizers](https://github.com/huggingface/tokenizers) - Community contributions and feedback

TrustFormers is developed and maintained by **COOLJAPAN OU (Team Kitasan)**.

If you find TrustFormers useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.

[https://github.com/sponsors/cool-japan](https://github.com/sponsors/cool-japan)

Your sponsorship helps us:

- Maintain and improve the COOLJAPAN ecosystem
- Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust
- Provide long-term support and security updates

Licensed under Apache License, Version 2.0 ([LICENSE](/cool-japan/trustformers/blob/master/LICENSE)).

Built with 🦀 and ❤️ by COOLJAPAN OU (Team KitaSan)