Release 4.0.0 · HuggingFace/Transformers.js

HuggingFace released Transformers.js v4, a major update featuring a new WebGPU backend rewritten in C++ for faster AI model inference in browsers, Node, Bun, and Deno. The release adds support for large models like GPT-OSS 20B and new architectures including Mamba and Mixture of Experts, achieving up to 4x speedup on BERT models.

4.0.0 🚀 Transformers.js v4 We're excited to announce that Transformers.js v4 is now available on NPM After a year of development we started in March 2025 🤯 , we're finally ready for you to use it. npm i @huggingface/transformers Links: YouTube Video https://www.youtube.com/watch?v=KnhppkY4gHs , Blog Post https://huggingface.co/blog/transformersjs-v4 , Demo Collection https://huggingface.co/collections/webml-community/transformersjs-v4-demos New WebGPU backend The biggest change is undoubtedly the adoption of a new WebGPU Runtime, completely rewritten in C++. We've worked closely with the ONNX Runtime team to thoroughly test this runtime across our ~200 supported model architectures, as well as many new v4-exclusive architectures. In addition to better operator support for performance, accuracy, and coverage , this new WebGPU runtime allows the same transformers.js code to be used across a wide variety of JavaScript environments, including browsers, server-side runtimes, and desktop applications. That's right, you can now run WebGPU-accelerated models directly in Node, Bun, and Deno We've proven that it's possible to run state-of-the-art AI models 100% locally in the browser, and now we're focused on performance: making these models run as fast as possible, even in resource-constrained environments. This required completely rethinking our export strategy, especially for large language models. We achieve this by re-implementing new models operation by operation, leveraging specialized ONNX Runtime Contrib Operators https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md like com.microsoft.GroupQueryAttention https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md com.microsoft.GroupQueryAttention , com.microsoft.MatMulNBits https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md com.microsoft.MatMulNBits , or com.microsoft.QMoE https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md com.microsoft.QMoE to maximize performance. For example, adopting the com.microsoft.MultiHeadAttention https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md com.microsoft.MultiHeadAttention operator, we were able to achieve a ~4x speedup for BERT-based embedding models. - ONNX Runtime improvements by @xenova https://github.com/xenova in 1306 https://github.com/huggingface/transformers.js/pull/1306 - Transformers.js V4: Native WebGPU EP, repo restructuring, and more by @xenova https://github.com/xenova in 1382 https://github.com/huggingface/transformers.js/pull/1382 New models Thanks to our new export strategy and ONNX Runtime's expanding support for custom operators, we've been able to add many new models and architectures to Transformers.js v4. These include popular models like GPT-OSS, Chatterbox, GraniteMoeHybrid, LFM2-MoE, HunYuanDenseV1, Apertus, Olmo3, FalconH1, and Youtu-LLM. Many of these required us to implement support for advanced architectural patterns, including Mamba state-space models , Multi-head Latent Attention MLA , and Mixture of Experts MoE . Perhaps most importantly, these models are all compatible with WebGPU, allowing users to run them directly in the browser or server-side JavaScript environments with hardware acceleration. We've released several Transformers.js v4 demos https://huggingface.co/collections/webml-community/transformersjs-v4-demos so far... and we'll continue to release more Additionally, we've added support for larger models exceeding 8B parameters. In our tests, we've been able to run GPT-OSS 20B q4f16 at ~60 tokens per second on an M4 Pro Max. - Add support for Apertus by @nico-martin https://github.com/nico-martin in 1465 https://github.com/huggingface/transformers.js/pull/1465 - Add support for FalconH1 by @xenova https://github.com/xenova in 1502 https://github.com/huggingface/transformers.js/pull/1502 - Add support for Cohere's Tiny Aya models by @xenova https://github.com/xenova in 1529 https://github.com/huggingface/transformers.js/pull/1529 - Add support for AFMoE by @xenova https://github.com/xenova in 1542 https://github.com/huggingface/transformers.js/pull/1542 - Add support for new Qwen VL models Qwen2.5-VL, Qwen3-VL, Qwen3.5, and Qwen3.5 MoE by @xenova https://github.com/xenova in 1551 https://github.com/huggingface/transformers.js/pull/1551 - Add support for Qwen2 MoE, Qwen3 MoE, Qwen3 Next, Qwen3-VL MoE, and Olmo Hybrid by @xenova https://github.com/xenova in 1562 https://github.com/huggingface/transformers.js/pull/1562 - Add support for EuroBERT by @xenova https://github.com/xenova in 1583 https://github.com/huggingface/transformers.js/pull/1583 - Add support for LightOnOCR and GLM-OCR by @xenova https://github.com/xenova in 1582 https://github.com/huggingface/transformers.js/pull/1582 - Add support for Nemotron-H by @xenova https://github.com/xenova in 1585 https://github.com/huggingface/transformers.js/pull/1585 - Add support for DeepSeek-v3 by @xenova https://github.com/xenova in 1586 https://github.com/huggingface/transformers.js/pull/1586 - Add support for mistral4 by @xenova https://github.com/xenova in 1587 https://github.com/huggingface/transformers.js/pull/1587 - Add support for GLM-MoE-DSA by @xenova https://github.com/xenova in 1588 https://github.com/huggingface/transformers.js/pull/1588 - Add support for Chatterbox @xenova https://github.com/xenova in 1592 https://github.com/huggingface/transformers.js/pull/1592 - Add support for Cohere ASR by @xenova https://github.com/xenova in 1610 https://github.com/huggingface/transformers.js/pull/1610 - Add support for SolarOpen and CHMv2 models by @xenova https://github.com/xenova in 1593 https://github.com/huggingface/transformers.js/pull/1593 - Voxtral Realtime, LFM2-VL, Granite Speech, and modeling type refactoring by @xenova https://github.com/xenova in 1569 https://github.com/huggingface/transformers.js/pull/1569 - Add support for Gemma3 VLM architecture by @xenova https://github.com/xenova in 1601 https://github.com/huggingface/transformers.js/pull/1601 New features ModelRegistry The new ModelRegistry API is designed for production workflows. It provides explicit visibility into pipeline assets before loading anything: list required files with get pipeline files , inspect per-file metadata with get file metadata quite useful to calculate total download size , check cache status with is pipeline cached , and clear cached artifacts with clear pipeline cache . You can also query available precision types for a model with get available dtypes . Based on this new API, progress callback now includes a progress total event, making it easy to render end-to-end loading progress without manually aggregating per-file updates. See ModelRegistry examples js import { ModelRegistry, pipeline } from "@huggingface/transformers"; const modelId = "onnx-community/all-MiniLM-L6-v2-ONNX"; const modelOptions = { dtype: "fp32" }; const files = await ModelRegistry.get pipeline files "feature-extraction", modelId, modelOptions ; // 'config.json', 'onnx/model.onnx', ..., 'tokenizer config.json' const metadata = await Promise.all files.map file = ModelRegistry.get file metadata modelId, file ; const downloadSize = metadata.reduce total, item = total + item.size, 0 ; const cached = await ModelRegistry.is pipeline cached "feature-extraction", modelId, modelOptions ; const dtypes = await ModelRegistry.get available dtypes modelId ; // 'fp32', 'fp16', 'q4', 'q4f16' if cached { await ModelRegistry.clear pipeline cache "feature-extraction", modelId, modelOptions ; } const pipe = await pipeline "feature-extraction", modelId, { progress callback: e = { if e.status === "progress total" { console.log ${Math.round e.progress }% ; } }, } ; New Environment Settings We also added new environment controls for model loading. env.useWasmCache enables caching of WASM runtime files when cache storage is available , allowing applications to work fully offline after the initial load. env.fetch lets you provide a custom fetch implementation for use cases such as authenticated model access, custom headers, and abortable requests. See env examples js import { env } from "@huggingface/transformers"; env.useWasmCache = true; env.fetch = url, options = fetch url, { ...options, headers: { ...options?.headers, Authorization: Bearer ${MY TOKEN} , }, } ; Improved Logging Controls Finally, logging is easier to manage in real-world deployments. ONNX Runtime WebGPU warnings are now hidden by default, and you can set explicit verbosity levels for both Transformers.js and ONNX Runtime. This update, also driven by community feedback, keeps console output focused on actionable signals rather than low-value noise. See logLevel example js import { env, LogLevel } from "@huggingface/transformers"; // LogLevel.DEBUG // LogLevel.INFO // LogLevel.WARNING // LogLevel.ERROR // LogLevel.NONE env.logLevel = LogLevel.WARNING; - v4 added wasm cache by @nico-martin https://github.com/nico-martin in 1471 https://github.com/huggingface/transformers.js/pull/1471 - V4 cache wasm file blob fix by @nico-martin https://github.com/nico-martin in 1489 https://github.com/huggingface/transformers.js/pull/1489 - v4 suppress console.error while creating InferenceSession by @nico-martin https://github.com/nico-martin in 1468 https://github.com/huggingface/transformers.js/pull/1468 - feat: add configurable log levels via env.logLevel by @taronsung https://github.com/taronsung in 1507 https://github.com/huggingface/transformers.js/pull/1507 - v4 Improve download progress tracking model cache registry and define which files will be loaded for pipelines by @nico-martin https://github.com/nico-martin in 1511 https://github.com/huggingface/transformers.js/pull/1511 - Add support for seedable random number generation by @xenova https://github.com/xenova in 1537 https://github.com/huggingface/transformers.js/pull/1537 - Add customFetch to env for user override by @sroussey https://github.com/sroussey in 1538 https://github.com/huggingface/transformers.js/pull/1538 and @xenova https://github.com/xenova in 1543 https://github.com/huggingface/transformers.js/pull/1543 - deno Support both wgpu and dawn webgpu backends by @xenova https://github.com/xenova in 1546 https://github.com/huggingface/transformers.js/pull/1546 - feat: experimental Cross-Origin Storage cache backend by @nico-martin https://github.com/nico-martin in 1549 https://github.com/huggingface/transformers.js/pull/1549 - Support PKV cached generation for Qwen-VL models by @xenova https://github.com/xenova in 1557 https://github.com/huggingface/transformers.js/pull/1557 - Use ModelRegistry for pipeline file loading by @xenova https://github.com/xenova in 1555 https://github.com/huggingface/transformers.js/pull/1555 - feat: add fast boolean is cached / is pipeline cached , closes 1554 https://github.com/huggingface/transformers.js/issues/1554 by @nico-martin https://github.com/nico-martin in 1559 https://github.com/huggingface/transformers.js/pull/1559 - feat: add ModelRegistry.get available dtypes to detect quantization by @sroussey https://github.com/sroussey in 1580 https://github.com/huggingface/transformers.js/pull/1580 - Add support for qwen3 5 text <- Qwen3 5ForCausalLM by @xenova https://github.com/xenova in 1602 https://github.com/huggingface/transformers.js/pull/1602 - Fix ModelRegistry calls when loading multimodal models with text-generation pipeline by @xenova https://github.com/xenova in 1614 https://github.com/huggingface/transformers.js/pull/1614 - Emit progress total events from PreTrainedModel.from pretrained by @xenova https://github.com/xenova in 1615 https://github.com/huggingface/transformers.js/pull/1615 Repository Restructuring Developing a new major version gave us the opportunity to invest in the codebase and tackle long-overdue refactoring efforts. PNPM Workspaces Until now, the GitHub repository served as our npm package. This worked well as long as the repository only exposed a single library. However, looking to the future, we saw the need for various sub-packages that depend heavily on the Transformers.js core while addressing different use cases, like library-specific implementations, or smaller utilities that most users don't need but are essential for some. That's why we converted the repository to a monorepo using pnpm workspaces. This allows us to ship smaller packages that depend on @huggingface/transformers without the overhead of maintaining separate repositories. Modular Class Structure Another major refactoring effort targeted the ever-growing models.js file. In v3, all available models were defined in a single file spanning over 8,000 lines, becoming increasingly difficult to maintain. For v4, we split this into smaller, focused modules with a clear distinction between utility functions, core logic, and model-specific implementations. This new structure improves readability and makes it much easier to add new models. Developers can now focus on model-specific logic without navigating through thousands of lines of unrelated code. Examples Repository In v3, many Transformers.js example projects lived directly in the main repository. For v4, we've moved them to a dedicated repository https://github.com/huggingface/transformers.js-examples , allowing us to maintain a cleaner codebase focused on the core library. This also makes it easier for users to find and contribute to examples without sifting through the main repository. Prettier We updated the Prettier configuration and reformatted all files in the repository. This ensures consistent formatting throughout the codebase, with all future PRs automatically following the same style. No more debates about formatting... Prettier handles it all, keeping the code clean and readable for everyone. Standalone Tokenizers.js Library A frequent request from users was to extract the tokenization logic into a separate library, and with v4, that's exactly what we've done. @huggingface/tokenizers https://www.npmjs.com/package/@huggingface/tokenizers is a complete refactor of the tokenization logic, designed to work seamlessly across browsers and server-side runtimes. At just 8.8kB gzipped with zero dependencies, it's incredibly lightweight while remaining fully type-safe. See example code js import { Tokenizer } from "@huggingface/tokenizers"; // Load from Hugging Face Hub const modelId = "HuggingFaceTB/SmolLM3-3B"; const tokenizerJson = await fetch https://huggingface.co/${modelId}/resolve/main/tokenizer.json .then res = res.json ; const tokenizerConfig = await fetch https://huggingface.co/${modelId}/resolve/main/tokenizer config.json .then res = res.json ; // Create tokenizer const tokenizer = new Tokenizer tokenizerJson, tokenizerConfig ; // Tokenize text const tokens = tokenizer.tokenize "Hello World" ; // 'Hello', 'ĠWorld' const encoded = tokenizer.encode "Hello World" ; // { ids: 9906, 4435 , tokens: 'Hello', 'ĠWorld' , ... } This separation keeps the core of Transformers.js focused and lean while offering a versatile, standalone tool that any WebML project can use independently. - Move examples by @xenova https://github.com/xenova in 1362 https://github.com/huggingface/transformers.js/pull/1362 - v4 Refactor models.js into separate per-model modeling files by @nico-martin https://github.com/nico-martin in 1498 https://github.com/huggingface/transformers.js/pull/1498 - V4 move to pnpm workspaces by @nico-martin https://github.com/nico-martin in 1504 https://github.com/huggingface/transformers.js/pull/1504 - v4 Tokenizers.js migration by @xenova https://github.com/xenova in 1501 https://github.com/huggingface/transformers.js/pull/1501 New build system We've migrated our build system from Webpack to esbuild, and the results have been incredible. Build times dropped from 2 seconds to just 200 milliseconds, a 10x improvement that makes development iteration significantly faster. Speed isn't the only benefit, though: bundle sizes also decreased by an average of 10% across all builds. The most notable improvement is in transformers.web.js, our default export, which is now 53% smaller, meaning faster downloads and quicker startup times for users. - v4 Switch build system to use esbuild by @nico-martin https://github.com/nico-martin in 1466 https://github.com/huggingface/transformers.js/pull/1466 Improved types We've made several quality-of-life improvements across the library. The type system has been enhanced with dynamic pipeline types that adapt based on inputs, providing better developer experience and type safety. - v4 Finalize pipeline types by @xenova https://github.com/xenova in 1505 https://github.com/huggingface/transformers.js/pull/1505 - Add type definitions for various pipeline outputs in transformers.js by @sroussey https://github.com/sroussey in 1535 https://github.com/huggingface/transformers.js/pull/1535 - fix: improves typedef of Message by @philnash https://github.com/philnash in 1525 https://github.com/huggingface/transformers.js/pull/1525 - added ProgressCallback and ProgressInfo as type exports by @nico-martin https://github.com/nico-martin in 1528 https://github.com/huggingface/transformers.js/pull/1528 Bug fixes - fixed backend loading path by @nico-martin https://github.com/nico-martin in 1464 https://github.com/huggingface/transformers.js/pull/1464 - fixed v4 ts errors by @nico-martin https://github.com/nico-martin in 1477 https://github.com/huggingface/transformers.js/pull/1477 - Hot-fix Float16Array usage in node by @xenova https://github.com/xenova in 1482 https://github.com/huggingface/transformers.js/pull/1482 - v4@next Fix WebGPU validation failed by @xenova https://github.com/xenova in 1519 https://github.com/huggingface/transformers.js/pull/1519 - Fix stopping criteria missing from generation pipelines by @xenova https://github.com/xenova in 1523 https://github.com/huggingface/transformers.js/pull/1523 - Overdue Whisper fixes by @xenova https://github.com/xenova in 1594 https://github.com/huggingface/transformers.js/pull/1594 - Fix custom cache in Node.js when cache returns non-paths e.g., Response by @xenova https://github.com/xenova in 1617 https://github.com/huggingface/transformers.js/pull/1617 - Include content-length header in COS Response by @tomayac https://github.com/tomayac in 1572 https://github.com/huggingface/transformers.js/pull/1572 - Fix: file size missing for cached files by @nico-martin https://github.com/nico-martin in 1575 https://github.com/huggingface/transformers.js/pull/1575 - deno via CDN Fix simultaneous multi-session loading e.g., VLMs and d support usage for image loading by @xenova https://github.com/xenova in 1556 https://github.com/huggingface/transformers.js/pull/1556 - fix: prevent partial file reads during concurrent downloads by @nico-martin https://github.com/nico-martin in 1548 https://github.com/huggingface/transformers.js/pull/1548 - Fix WASM factory blob URL loading 1527 https://github.com/huggingface/transformers.js/issues/1527 , 1532 https://github.com/huggingface/transformers.js/issues/1532 by @nico-martin https://github.com/nico-martin in 1558 https://github.com/huggingface/transformers.js/pull/1558 Documentation improvements - Fix broken source code links in the Tutorials sections. by @SeanGallen https://github.com/SeanGallen in 1526 https://github.com/huggingface/transformers.js/pull/1526 - V4 docs : add Vercel AI SDK integration with browser-ai/transformers.js package by @xenova https://github.com/xenova in 1566 https://github.com/huggingface/transformers.js/pull/1566 - docs: add step-by-step guide for contributing a new model by @nico-martin https://github.com/nico-martin in 1560 https://github.com/huggingface/transformers.js/pull/1560 - Add background-removal pipeline to JSDoc by @xenova https://github.com/xenova in 1595 https://github.com/huggingface/transformers.js/pull/1595 - fix docs : Fix typo in WebGPU documentation by @mattsoulanille https://github.com/mattsoulanille in 1607 https://github.com/huggingface/transformers.js/pull/1607 Miscellaneous improvements - Improve text streamer when skip special tokens: false by @xenova https://github.com/xenova in 1520 https://github.com/huggingface/transformers.js/pull/1520 - Upgrade GitHub Actions for Node 24 compatibility follow-up by @xenova https://github.com/xenova in 1531 https://github.com/huggingface/transformers.js/pull/1531 - Align audio spectrogram transformer name with python library by @xenova https://github.com/xenova in 1591 https://github.com/huggingface/transformers.js/pull/1591 - Memoize network calls in model registry by @nico-martin https://github.com/nico-martin in 1552 https://github.com/huggingface/transformers.js/pull/1552 - added no-store to range request 1561 https://github.com/huggingface/transformers.js/issues/1561 by @nico-martin https://github.com/nico-martin in 1564 https://github.com/huggingface/transformers.js/pull/1564 New Contributors @taronsung https://github.com/taronsung made their first contribution in 1507 https://github.com/huggingface/transformers.js/pull/1507 @philnash https://github.com/philnash made their first contribution in 1525 https://github.com/huggingface/transformers.js/pull/1525 @tomayac https://github.com/tomayac made their first contribution in 1572 https://github.com/huggingface/transformers.js/pull/1572 @mattsoulanille https://github.com/mattsoulanille made their first contribution in 1607 https://github.com/huggingface/transformers.js/pull/1607 Full Changelog : 3.8.1...4.0.0