Practical NLP in the Browser with Transformers.js Hugging Face released Transformers.js, a JavaScript library that runs state-of-the-art NLP models directly in the browser on the user's device with no server required. The library, which is functionally equivalent to Hugging Face's Python transformers library, uses ONNX Runtime to execute models converted from PyTorch, TensorFlow, or JAX, and caches model weights locally after the first download. This enables developers to perform text classification, zero-shot labeling, and question answering entirely offline through a browser-based pipeline API. Practical NLP in the Browser with Transformers.js This tutorial covers three NLP tasks: text classification, zero-shot labelling, and question answering using Transformers.js's pipeline API. Introduction For a long time, running transformer models meant maintaining a Python server, paying for GPU time, and routing every inference request through an API. The user typed something, it left their machine, touched your infrastructure, and came back as a prediction. That architecture made sense when the models were too large to run anywhere else. It is no longer the only option. Transformers.js https://huggingface.co/docs/transformers.js/en/index changes the equation. It runs state-of-the-art NLP models directly in the browser, on the user's device, with no server involved. The models download once, cache locally, and run offline from that point forward. The Python-to-JavaScript translation is almost one-to-one: js // JavaScript -- nearly identical import { pipeline } from '@huggingface/transformers'; const classifier = await pipeline 'sentiment-analysis' ; const result = await classifier 'I love transformers ' ; This tutorial covers three NLP tasks: text classification, zero-shot labelling, and question answering using Transformers.js's pipeline API. For each task, you will see how to initialize the pipeline, what the output structure looks like and how to interpret it, and a working HTML example you can open directly in a browser. The tutorial closes with a complete support ticket routing application that combines all three pipelines into one practical tool. Every code example in this article uses the CDN import path, so there is no build step required. Open a text editor, paste the code, and run it. What Transformers.js Actually Is The library is designed to be functionally equivalent to Hugging Face's Python transformers library https://huggingface.co/docs/transformers.js/en/index , meaning the same pretrained models, the same task names, and the same pipeline API just in JavaScript. Under the hood, the bridge that makes this possible is ONNX Runtime https://onnxruntime.ai/ . Models trained in PyTorch, TensorFlow, or JAX are converted to ONNX format https://onnx.ai/ using Hugging Face Optimum https://github.com/huggingface/optimum . ONNX Runtime then executes these models in the browser. By default, it runs on CPU via WebAssembly WASM , which works in every modern browser. If you want GPU acceleration, setting device: 'webgpu' routes computation through the browser's WebGPU API meaningfully faster where available, though still experimental in some environments. Model caching . The first time a pipeline runs, the model weights download from Hugging Face Hub https://huggingface.co/models?library=transformers.js and cache in the browser IndexedDB in a browser context, the filesystem in Node.js. Developer testing shows the sentiment analysis pipeline https://www.raymondcamden.com/2024/12/03/using-transformersjs-for-ai-in-the-browser downloads around 111 MB on first load. Subsequent runs skip the download entirely and load from cache. This means the first user session has a bandwidth cost; every session after is fast and offline-capable Quantization . The dtype option controls model precision. q8 8-bit quantization is the WASM default; it gives you a good balance of size and accuracy. q4 cuts the file roughly in half with a 1–3% accuracy loss on most tasks, which is the right trade-off for mobile or slow connections. For Node.js server-side use, fp32 gives full precision with no size constraint js // Default WASM execution -- works everywhere const pipe = await pipeline 'sentiment-analysis' ; // WebGPU for faster inference on compatible hardware const pipe = await pipeline 'sentiment-analysis', null, { device: 'webgpu' } ; // 4-bit quantization for smaller model downloads const pipe = await pipeline 'sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { dtype: 'q4' } ; The pipeline API The pipeline function is the entire public interface for most use cases. It bundles three things: a pretrained model, a tokenizer, and postprocessing logic, into a single callable object. You do not touch the tokenizer or model weights directly. You call the pipeline with text and get structured output back. The signature has three parts: js const pipe = await pipeline task, model?, options? ; const result = await pipe input, inferenceOptions? ; task is a string identifier that tells the library which kind of model to load and how to handle input and output. model is optional; if you omit it, the library loads the default model for that task. If you specify a model ID like ' Xenova/distilbert-base-uncased-finetuned-sst-2-english ' , that model loads from the Hub. options is where you set device, dtype , and progress callback . Both steps are async. pipeline downloads and loads the model into memory. This is the slow part on the first run. The pipe call itself is usually fast once the model is loaded. Both return Promises, which means your UI needs to handle the loading state. A progress callback lets you track the download and show progress to the user: // progress callback fires during model download with status updates // This is important UX -- users need to know something is happening const pipe = await pipeline 'sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { dtype: 'q8', progress callback: progress = { // progress.status can be: 'initiate', 'download', 'progress', 'done' if progress.status === 'progress' { const pct = Math.round progress.progress ; document.getElementById 'progress' .textContent = Loading model: ${pct}% ; } if progress.status === 'ready' { document.getElementById 'progress' .textContent = 'Model ready'; } } } ; One important note from the official documentation https://huggingface.co/docs/transformers.js/en/index : Transformers.js is an inference-only library. You cannot fine-tune or train models with it. If your task needs a custom model, training happens elsewhere Python, cloud , and the resulting ONNX export runs in the browser. Task 1: Text Classification Text classification assigns a label and a confidence score to input text. The most common form is sentiment analysis, positive vs. negative, but the same pipeline architecture handles any fixed set of categories the model was trained on. What the output looks like: js const result = await classifier 'This product completely exceeded my expectations.' ; // { label: 'POSITIVE', score: 0.9997 } Output is an array of objects. Each object has label the predicted class as a string and score a float between 0 and 1 representing the model's confidence . A score of 0.9997 means the model is highly confident. A score of 0.52 means it is barely above the decision threshold treat that as uncertain and handle it accordingly in your application logic. The output is always an array, even for a single input, because the same pipeline call handles batches: js const results = await classifier 'This is great ', 'Completely broken, waste of money.' ; // // { label: 'POSITIVE', score: 0.9998 }, // { label: 'NEGATIVE', score: 0.9991 } // // Full Working Example The example below is a complete, self-contained HTML file. Open it in any modern browser. The model downloads on first run and caches subsequent loads, which are instant. < DOCTYPE html