# Release 4.0.0 · HuggingFace/Transformers.js

> Source: <https://github.com/huggingface/transformers.js/releases/tag/4.0.0>
> Published: 2026-06-20 05:52:29+00:00

# 4.0.0

# 🚀 Transformers.js v4

We're excited to announce that Transformers.js v4 is now available on NPM! After a year of development (we started in March 2025 🤯), we're finally ready for you to use it.

```
npm i @huggingface/transformers
```

Links: [YouTube Video](https://www.youtube.com/watch?v=KnhppkY4gHs), [Blog Post](https://huggingface.co/blog/transformersjs-v4), [Demo Collection](https://huggingface.co/collections/webml-community/transformersjs-v4-demos)

## New WebGPU backend

The biggest change is undoubtedly the adoption of a new WebGPU Runtime, completely rewritten in C++. We've worked closely with the ONNX Runtime team to thoroughly test this runtime across our ~200 supported model architectures, as well as many new v4-exclusive architectures.

In addition to better operator support (for performance, accuracy, and coverage), this new WebGPU runtime allows the same transformers.js code to be used across a wide variety of JavaScript environments, including browsers, server-side runtimes, and desktop applications. That's right, you can now run WebGPU-accelerated models directly in Node, Bun, and Deno!

We've proven that it's possible to run state-of-the-art AI models 100% locally in the browser, and now we're focused on performance: making these models run as fast as possible, even in resource-constrained environments. This required completely rethinking our export strategy, especially for large language models. We achieve this by re-implementing new models operation by operation, leveraging specialized ONNX Runtime [Contrib Operators](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md) like [com.microsoft.GroupQueryAttention](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.GroupQueryAttention), [com.microsoft.MatMulNBits](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulNBits), or [com.microsoft.QMoE](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QMoE) to maximize performance.

For example, adopting the [com.microsoft.MultiHeadAttention](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MultiHeadAttention) operator, we were able to achieve a ~4x speedup for BERT-based embedding models.

- ONNX Runtime improvements by
[@xenova](https://github.com/xenova)in[#1306](https://github.com/huggingface/transformers.js/pull/1306) - Transformers.js V4: Native WebGPU EP, repo restructuring, and more! by
[@xenova](https://github.com/xenova)in[#1382](https://github.com/huggingface/transformers.js/pull/1382)

## New models

Thanks to our new export strategy and ONNX Runtime's expanding support for custom operators, we've been able to add many new models and architectures to Transformers.js v4. These include popular models like GPT-OSS, Chatterbox, GraniteMoeHybrid, LFM2-MoE, HunYuanDenseV1, Apertus, Olmo3, FalconH1, and Youtu-LLM. Many of these required us to implement support for advanced architectural patterns, including Mamba (state-space models), Multi-head Latent Attention (MLA), and Mixture of Experts (MoE). Perhaps most importantly, these models are all compatible with WebGPU, allowing users to run them directly in the browser or server-side JavaScript environments with hardware acceleration. We've released several [Transformers.js v4 demos](https://huggingface.co/collections/webml-community/transformersjs-v4-demos) so far... and we'll continue to release more!

Additionally, we've added support for larger models exceeding 8B parameters. In our tests, we've been able to run GPT-OSS 20B (q4f16) at ~60 tokens per second on an M4 Pro Max.

- Add support for Apertus by
[@nico-martin](https://github.com/nico-martin)in[#1465](https://github.com/huggingface/transformers.js/pull/1465) - Add support for FalconH1 by
[@xenova](https://github.com/xenova)in[#1502](https://github.com/huggingface/transformers.js/pull/1502) - Add support for Cohere's Tiny Aya models by
[@xenova](https://github.com/xenova)in[#1529](https://github.com/huggingface/transformers.js/pull/1529) - Add support for AFMoE by
[@xenova](https://github.com/xenova)in[#1542](https://github.com/huggingface/transformers.js/pull/1542) - Add support for new Qwen VL models (Qwen2.5-VL, Qwen3-VL, Qwen3.5, and Qwen3.5 MoE) by
[@xenova](https://github.com/xenova)in[#1551](https://github.com/huggingface/transformers.js/pull/1551) - Add support for Qwen2 MoE, Qwen3 MoE, Qwen3 Next, Qwen3-VL MoE, and Olmo Hybrid by
[@xenova](https://github.com/xenova)in[#1562](https://github.com/huggingface/transformers.js/pull/1562) - Add support for EuroBERT by
[@xenova](https://github.com/xenova)in[#1583](https://github.com/huggingface/transformers.js/pull/1583) - Add support for LightOnOCR and GLM-OCR by
[@xenova](https://github.com/xenova)in[#1582](https://github.com/huggingface/transformers.js/pull/1582) - Add support for Nemotron-H by
[@xenova](https://github.com/xenova)in[#1585](https://github.com/huggingface/transformers.js/pull/1585) - Add support for DeepSeek-v3 by
[@xenova](https://github.com/xenova)in[#1586](https://github.com/huggingface/transformers.js/pull/1586) - Add support for mistral4 by
[@xenova](https://github.com/xenova)in[#1587](https://github.com/huggingface/transformers.js/pull/1587) - Add support for GLM-MoE-DSA by
[@xenova](https://github.com/xenova)in[#1588](https://github.com/huggingface/transformers.js/pull/1588) - Add support for Chatterbox
[@xenova](https://github.com/xenova)in[#1592](https://github.com/huggingface/transformers.js/pull/1592) - Add support for Cohere ASR by
[@xenova](https://github.com/xenova)in[#1610](https://github.com/huggingface/transformers.js/pull/1610) - Add support for SolarOpen and CHMv2 models by
[@xenova](https://github.com/xenova)in[#1593](https://github.com/huggingface/transformers.js/pull/1593) - Voxtral Realtime, LFM2-VL, Granite Speech, and modeling type refactoring by
[@xenova](https://github.com/xenova)in[#1569](https://github.com/huggingface/transformers.js/pull/1569) - Add support for Gemma3 VLM architecture by
[@xenova](https://github.com/xenova)in[#1601](https://github.com/huggingface/transformers.js/pull/1601)

## New features

### ModelRegistry

The new `ModelRegistry`

API is designed for production workflows. It provides explicit visibility into pipeline assets before loading anything: list required files with `get_pipeline_files`

, inspect per-file metadata with `get_file_metadata`

(quite useful to calculate total download size), check cache status with `is_pipeline_cached`

, and clear cached artifacts with `clear_pipeline_cache`

. You can also query available precision types for a model with `get_available_dtypes`

. Based on this new API, `progress_callback`

now includes a `progress_total`

event, making it easy to render end-to-end loading progress without manually aggregating per-file updates.

## See `ModelRegistry` examples

``` js
import { ModelRegistry, pipeline } from "@huggingface/transformers";

const modelId = "onnx-community/all-MiniLM-L6-v2-ONNX";
const modelOptions = { dtype: "fp32" };

const files = await ModelRegistry.get_pipeline_files(
  "feature-extraction",
  modelId,
  modelOptions
);
// ['config.json', 'onnx/model.onnx', ..., 'tokenizer_config.json']

const metadata = await Promise.all(
  files.map(file => ModelRegistry.get_file_metadata(modelId, file))
);

const downloadSize = metadata.reduce((total, item) => total + item.size, 0);

const cached = await ModelRegistry.is_pipeline_cached(
  "feature-extraction",
  modelId,
  modelOptions
);

const dtypes = await ModelRegistry.get_available_dtypes(modelId);
// ['fp32', 'fp16', 'q4', 'q4f16']

if (cached) {
  await ModelRegistry.clear_pipeline_cache(
    "feature-extraction",
    modelId,
    modelOptions
  );
}

const pipe = await pipeline(
  "feature-extraction",
  modelId,
  {
    progress_callback: e => {
      if (e.status === "progress_total") {
        console.log(`${Math.round(e.progress)}%`);
      }
    },
  }
);
```

### New Environment Settings

We also added new environment controls for model loading. `env.useWasmCache`

enables caching of WASM runtime files (when cache storage is available), allowing applications to work fully offline after the initial load.

`env.fetch`

lets you provide a custom fetch implementation for use cases such as authenticated model access, custom headers, and abortable requests.

## See env examples

``` js
import { env } from "@huggingface/transformers";

env.useWasmCache = true;

env.fetch = (url, options) =>
  fetch(url, {
    ...options,
    headers: {
      ...options?.headers,
      Authorization: `Bearer ${MY_TOKEN}`,
    },
  });
```

### Improved Logging Controls

Finally, logging is easier to manage in real-world deployments. ONNX Runtime WebGPU warnings are now hidden by default, and you can set explicit verbosity levels for both Transformers.js and ONNX Runtime. This update, also driven by community feedback, keeps console output focused on actionable signals rather than low-value noise.

## See `logLevel` example

``` js
import { env, LogLevel } from "@huggingface/transformers";

// LogLevel.DEBUG
// LogLevel.INFO
// LogLevel.WARNING
// LogLevel.ERROR
// LogLevel.NONE

env.logLevel = LogLevel.WARNING;
```

- [v4] added wasm cache by
[@nico-martin](https://github.com/nico-martin)in[#1471](https://github.com/huggingface/transformers.js/pull/1471) - V4 cache wasm file blob fix by
[@nico-martin](https://github.com/nico-martin)in[#1489](https://github.com/huggingface/transformers.js/pull/1489) - [v4] suppress console.error while creating InferenceSession by
[@nico-martin](https://github.com/nico-martin)in[#1468](https://github.com/huggingface/transformers.js/pull/1468) - feat: add configurable log levels via env.logLevel by
[@taronsung](https://github.com/taronsung)in[#1507](https://github.com/huggingface/transformers.js/pull/1507) - [v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines) by
[@nico-martin](https://github.com/nico-martin)in[#1511](https://github.com/huggingface/transformers.js/pull/1511) - Add support for seedable random number generation by
[@xenova](https://github.com/xenova)in[#1537](https://github.com/huggingface/transformers.js/pull/1537) - Add customFetch to env for user override by
[@sroussey](https://github.com/sroussey)in[#1538](https://github.com/huggingface/transformers.js/pull/1538)and[@xenova](https://github.com/xenova)in[#1543](https://github.com/huggingface/transformers.js/pull/1543) - [deno] Support both wgpu and dawn webgpu backends by
[@xenova](https://github.com/xenova)in[#1546](https://github.com/huggingface/transformers.js/pull/1546) - feat: experimental Cross-Origin Storage cache backend by
[@nico-martin](https://github.com/nico-martin)in[#1549](https://github.com/huggingface/transformers.js/pull/1549) - Support PKV cached generation for Qwen-VL models by
[@xenova](https://github.com/xenova)in[#1557](https://github.com/huggingface/transformers.js/pull/1557) - Use ModelRegistry for pipeline file loading by
[@xenova](https://github.com/xenova)in[#1555](https://github.com/huggingface/transformers.js/pull/1555) - feat: add fast boolean
`is_cached`

/`is_pipeline_cached`

, closes[#1554](https://github.com/huggingface/transformers.js/issues/1554)by[@nico-martin](https://github.com/nico-martin)in[#1559](https://github.com/huggingface/transformers.js/pull/1559) - feat: add ModelRegistry.get_available_dtypes() to detect quantization by
[@sroussey](https://github.com/sroussey)in[#1580](https://github.com/huggingface/transformers.js/pull/1580) - Add support for qwen3_5_text <-> Qwen3_5ForCausalLM by
[@xenova](https://github.com/xenova)in[#1602](https://github.com/huggingface/transformers.js/pull/1602) - Fix ModelRegistry calls when loading multimodal models with text-generation pipeline by
[@xenova](https://github.com/xenova)in[#1614](https://github.com/huggingface/transformers.js/pull/1614) - Emit
`progress_total`

events from`PreTrainedModel.from_pretrained()`

by[@xenova](https://github.com/xenova)in[#1615](https://github.com/huggingface/transformers.js/pull/1615)

## Repository Restructuring

Developing a new major version gave us the opportunity to invest in the codebase and tackle long-overdue refactoring efforts.

### PNPM Workspaces

Until now, the GitHub repository served as our npm package. This worked well as long as the repository only exposed a single library. However, looking to the future, we saw the need for various sub-packages that depend heavily on the Transformers.js core while addressing different use cases, like library-specific implementations, or smaller utilities that most users don't need but are essential for some.

That's why we converted the repository to a monorepo using pnpm workspaces. This allows us to ship smaller packages that depend on `@huggingface/transformers`

without the overhead of maintaining separate repositories.

### Modular Class Structure

Another major refactoring effort targeted the ever-growing models.js file. In v3, all available models were defined in a single file spanning over 8,000 lines, becoming increasingly difficult to maintain. For v4, we split this into smaller, focused modules with a clear distinction between utility functions, core logic, and model-specific implementations. This new structure improves readability and makes it much easier to add new models. Developers can now focus on model-specific logic without navigating through thousands of lines of unrelated code.

### Examples Repository

In v3, many Transformers.js example projects lived directly in the main repository. For v4, we've moved them to a [dedicated repository](https://github.com/huggingface/transformers.js-examples), allowing us to maintain a cleaner codebase focused on the core library. This also makes it easier for users to find and contribute to examples without sifting through the main repository.

### Prettier

We updated the Prettier configuration and reformatted all files in the repository. This ensures consistent formatting throughout the codebase, with all future PRs automatically following the same style. No more debates about formatting... Prettier handles it all, keeping the code clean and readable for everyone.

### Standalone Tokenizers.js Library

A frequent request from users was to extract the tokenization logic into a separate library, and with v4, that's exactly what we've done. [@huggingface/tokenizers](https://www.npmjs.com/package/@huggingface/tokenizers) is a complete refactor of the tokenization logic, designed to work seamlessly across browsers and server-side runtimes. At just 8.8kB (gzipped) with zero dependencies, it's incredibly lightweight while remaining fully type-safe.

## See example code

``` js
import { Tokenizer } from "@huggingface/tokenizers";

// Load from Hugging Face Hub
const modelId = "HuggingFaceTB/SmolLM3-3B";
const tokenizerJson = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer.json`
).then(res => res.json());

const tokenizerConfig = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer_config.json`
).then(res => res.json());

// Create tokenizer
const tokenizer = new Tokenizer(tokenizerJson, tokenizerConfig);

// Tokenize text
const tokens = tokenizer.tokenize("Hello World");
// ['Hello', 'ĠWorld']

const encoded = tokenizer.encode("Hello World");
// { ids: [9906, 4435], tokens: ['Hello', 'ĠWorld'], ... }
```

This separation keeps the core of Transformers.js focused and lean while offering a versatile, standalone tool that any WebML project can use independently.

- Move examples by
[@xenova](https://github.com/xenova)in[#1362](https://github.com/huggingface/transformers.js/pull/1362) - [v4] Refactor models.js into separate per-model modeling files by
[@nico-martin](https://github.com/nico-martin)in[#1498](https://github.com/huggingface/transformers.js/pull/1498) - V4 move to pnpm workspaces by
[@nico-martin](https://github.com/nico-martin)in[#1504](https://github.com/huggingface/transformers.js/pull/1504) - [v4] Tokenizers.js migration by
[@xenova](https://github.com/xenova)in[#1501](https://github.com/huggingface/transformers.js/pull/1501)

## New build system

We've migrated our build system from Webpack to esbuild, and the results have been incredible. Build times dropped from 2 seconds to just 200 milliseconds, a 10x improvement that makes development iteration significantly faster. Speed isn't the only benefit, though: bundle sizes also decreased by an average of 10% across all builds. The most notable improvement is in transformers.web.js, our default export, which is now 53% smaller, meaning faster downloads and quicker startup times for users.

- [v4] Switch build system to use esbuild by
[@nico-martin](https://github.com/nico-martin)in[#1466](https://github.com/huggingface/transformers.js/pull/1466)

## Improved types

We've made several quality-of-life improvements across the library. The type system has been enhanced with dynamic pipeline types that adapt based on inputs, providing better developer experience and type safety.

- [v4] Finalize pipeline types by
[@xenova](https://github.com/xenova)in[#1505](https://github.com/huggingface/transformers.js/pull/1505) - Add type definitions for various pipeline outputs in transformers.js by
[@sroussey](https://github.com/sroussey)in[#1535](https://github.com/huggingface/transformers.js/pull/1535) - fix: improves typedef of Message by
[@philnash](https://github.com/philnash)in[#1525](https://github.com/huggingface/transformers.js/pull/1525) - added ProgressCallback and ProgressInfo as type exports by
[@nico-martin](https://github.com/nico-martin)in[#1528](https://github.com/huggingface/transformers.js/pull/1528)

## Bug fixes

- fixed backend loading path by
[@nico-martin](https://github.com/nico-martin)in[#1464](https://github.com/huggingface/transformers.js/pull/1464) - fixed v4 ts errors by
[@nico-martin](https://github.com/nico-martin)in[#1477](https://github.com/huggingface/transformers.js/pull/1477) - Hot-fix Float16Array usage in node by
[@xenova](https://github.com/xenova)in[#1482](https://github.com/huggingface/transformers.js/pull/1482) - [v4@next] Fix WebGPU validation failed by
[@xenova](https://github.com/xenova)in[#1519](https://github.com/huggingface/transformers.js/pull/1519) - Fix
`stopping_criteria`

missing from generation pipelines by[@xenova](https://github.com/xenova)in[#1523](https://github.com/huggingface/transformers.js/pull/1523) - Overdue Whisper fixes by
[@xenova](https://github.com/xenova)in[#1594](https://github.com/huggingface/transformers.js/pull/1594) - Fix custom cache in Node.js when cache returns non-paths (e.g., Response) by
[@xenova](https://github.com/xenova)in[#1617](https://github.com/huggingface/transformers.js/pull/1617) - Include
`content-length`

header in COS`Response`

by[@tomayac](https://github.com/tomayac)in[#1572](https://github.com/huggingface/transformers.js/pull/1572) - Fix: file size missing for cached files by
[@nico-martin](https://github.com/nico-martin)in[#1575](https://github.com/huggingface/transformers.js/pull/1575) - [deno via CDN] Fix simultaneous multi-session loading (e.g., VLMs) and d support usage for image loading by
[@xenova](https://github.com/xenova)in[#1556](https://github.com/huggingface/transformers.js/pull/1556) - fix: prevent partial file reads during concurrent downloads by
[@nico-martin](https://github.com/nico-martin)in[#1548](https://github.com/huggingface/transformers.js/pull/1548) - Fix WASM factory blob URL loading (
[#1527](https://github.com/huggingface/transformers.js/issues/1527),[#1532](https://github.com/huggingface/transformers.js/issues/1532)) by[@nico-martin](https://github.com/nico-martin)in[#1558](https://github.com/huggingface/transformers.js/pull/1558)

## Documentation improvements

- Fix broken source code links in the Tutorials sections. by
[@SeanGallen](https://github.com/SeanGallen)in[#1526](https://github.com/huggingface/transformers.js/pull/1526) - V4 (docs): add Vercel AI SDK integration with browser-ai/transformers.js package by
[@xenova](https://github.com/xenova)in[#1566](https://github.com/huggingface/transformers.js/pull/1566) - docs: add step-by-step guide for contributing a new model by
[@nico-martin](https://github.com/nico-martin)in[#1560](https://github.com/huggingface/transformers.js/pull/1560) - Add background-removal pipeline to JSDoc by
[@xenova](https://github.com/xenova)in[#1595](https://github.com/huggingface/transformers.js/pull/1595) - fix(docs): Fix typo in WebGPU documentation by
[@mattsoulanille](https://github.com/mattsoulanille)in[#1607](https://github.com/huggingface/transformers.js/pull/1607)

## Miscellaneous improvements

- Improve text streamer when
`skip_special_tokens: false`

by[@xenova](https://github.com/xenova)in[#1520](https://github.com/huggingface/transformers.js/pull/1520) - Upgrade GitHub Actions for Node 24 compatibility (follow-up) by
[@xenova](https://github.com/xenova)in[#1531](https://github.com/huggingface/transformers.js/pull/1531) - Align audio_spectrogram_transformer name with python library by
[@xenova](https://github.com/xenova)in[#1591](https://github.com/huggingface/transformers.js/pull/1591) - Memoize network calls in model registry by
[@nico-martin](https://github.com/nico-martin)in[#1552](https://github.com/huggingface/transformers.js/pull/1552) - added no-store to range request
[#1561](https://github.com/huggingface/transformers.js/issues/1561)by[@nico-martin](https://github.com/nico-martin)in[#1564](https://github.com/huggingface/transformers.js/pull/1564)

## New Contributors

[@taronsung](https://github.com/taronsung)made their first contribution in[#1507](https://github.com/huggingface/transformers.js/pull/1507)[@philnash](https://github.com/philnash)made their first contribution in[#1525](https://github.com/huggingface/transformers.js/pull/1525)[@tomayac](https://github.com/tomayac)made their first contribution in[#1572](https://github.com/huggingface/transformers.js/pull/1572)[@mattsoulanille](https://github.com/mattsoulanille)made their first contribution in[#1607](https://github.com/huggingface/transformers.js/pull/1607)

**Full Changelog**: `3.8.1...4.0.0`