Practical NLP in the Browser with Transformers.js

Hugging Face released Transformers.js, a JavaScript library that runs state-of-the-art NLP models directly in the browser on the user's device with no server required. The library, which is functionally equivalent to Hugging Face's Python transformers library, uses ONNX Runtime to execute models converted from PyTorch, TensorFlow, or JAX, and caches model weights locally after the first download. This enables developers to perform text classification, zero-shot labeling, and question answering entirely offline through a browser-based pipeline API.

Practical NLP in the Browser with Transformers.js This tutorial covers three NLP tasks: text classification, zero-shot labelling, and question answering using Transformers.js's pipeline API. Introduction For a long time, running transformer models meant maintaining a Python server, paying for GPU time, and routing every inference request through an API. The user typed something, it left their machine, touched your infrastructure, and came back as a prediction. That architecture made sense when the models were too large to run anywhere else. It is no longer the only option. Transformers.js https://huggingface.co/docs/transformers.js/en/index changes the equation. It runs state-of-the-art NLP models directly in the browser, on the user's device, with no server involved. The models download once, cache locally, and run offline from that point forward. The Python-to-JavaScript translation is almost one-to-one: js // JavaScript -- nearly identical import { pipeline } from '@huggingface/transformers'; const classifier = await pipeline 'sentiment-analysis' ; const result = await classifier 'I love transformers ' ; This tutorial covers three NLP tasks: text classification, zero-shot labelling, and question answering using Transformers.js's pipeline API. For each task, you will see how to initialize the pipeline, what the output structure looks like and how to interpret it, and a working HTML example you can open directly in a browser. The tutorial closes with a complete support ticket routing application that combines all three pipelines into one practical tool. Every code example in this article uses the CDN import path, so there is no build step required. Open a text editor, paste the code, and run it. What Transformers.js Actually Is The library is designed to be functionally equivalent to Hugging Face's Python transformers library https://huggingface.co/docs/transformers.js/en/index , meaning the same pretrained models, the same task names, and the same pipeline API just in JavaScript. Under the hood, the bridge that makes this possible is ONNX Runtime https://onnxruntime.ai/ . Models trained in PyTorch, TensorFlow, or JAX are converted to ONNX format https://onnx.ai/ using Hugging Face Optimum https://github.com/huggingface/optimum . ONNX Runtime then executes these models in the browser. By default, it runs on CPU via WebAssembly WASM , which works in every modern browser. If you want GPU acceleration, setting device: 'webgpu' routes computation through the browser's WebGPU API meaningfully faster where available, though still experimental in some environments. Model caching . The first time a pipeline runs, the model weights download from Hugging Face Hub https://huggingface.co/models?library=transformers.js and cache in the browser IndexedDB in a browser context, the filesystem in Node.js. Developer testing shows the sentiment analysis pipeline https://www.raymondcamden.com/2024/12/03/using-transformersjs-for-ai-in-the-browser downloads around 111 MB on first load. Subsequent runs skip the download entirely and load from cache. This means the first user session has a bandwidth cost; every session after is fast and offline-capable Quantization . The dtype option controls model precision. q8 8-bit quantization is the WASM default; it gives you a good balance of size and accuracy. q4 cuts the file roughly in half with a 1–3% accuracy loss on most tasks, which is the right trade-off for mobile or slow connections. For Node.js server-side use, fp32 gives full precision with no size constraint js // Default WASM execution -- works everywhere const pipe = await pipeline 'sentiment-analysis' ; // WebGPU for faster inference on compatible hardware const pipe = await pipeline 'sentiment-analysis', null, { device: 'webgpu' } ; // 4-bit quantization for smaller model downloads const pipe = await pipeline 'sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { dtype: 'q4' } ; The pipeline API The pipeline function is the entire public interface for most use cases. It bundles three things: a pretrained model, a tokenizer, and postprocessing logic, into a single callable object. You do not touch the tokenizer or model weights directly. You call the pipeline with text and get structured output back. The signature has three parts: js const pipe = await pipeline task, model?, options? ; const result = await pipe input, inferenceOptions? ; task is a string identifier that tells the library which kind of model to load and how to handle input and output. model is optional; if you omit it, the library loads the default model for that task. If you specify a model ID like ' Xenova/distilbert-base-uncased-finetuned-sst-2-english ' , that model loads from the Hub. options is where you set device, dtype , and progress callback . Both steps are async. pipeline downloads and loads the model into memory. This is the slow part on the first run. The pipe call itself is usually fast once the model is loaded. Both return Promises, which means your UI needs to handle the loading state. A progress callback lets you track the download and show progress to the user: // progress callback fires during model download with status updates // This is important UX -- users need to know something is happening const pipe = await pipeline 'sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { dtype: 'q8', progress callback: progress = { // progress.status can be: 'initiate', 'download', 'progress', 'done' if progress.status === 'progress' { const pct = Math.round progress.progress ; document.getElementById 'progress' .textContent = Loading model: ${pct}% ; } if progress.status === 'ready' { document.getElementById 'progress' .textContent = 'Model ready'; } } } ; One important note from the official documentation https://huggingface.co/docs/transformers.js/en/index : Transformers.js is an inference-only library. You cannot fine-tune or train models with it. If your task needs a custom model, training happens elsewhere Python, cloud , and the resulting ONNX export runs in the browser. Task 1: Text Classification Text classification assigns a label and a confidence score to input text. The most common form is sentiment analysis, positive vs. negative, but the same pipeline architecture handles any fixed set of categories the model was trained on. What the output looks like: js const result = await classifier 'This product completely exceeded my expectations.' ; // { label: 'POSITIVE', score: 0.9997 } Output is an array of objects. Each object has label the predicted class as a string and score a float between 0 and 1 representing the model's confidence . A score of 0.9997 means the model is highly confident. A score of 0.52 means it is barely above the decision threshold treat that as uncertain and handle it accordingly in your application logic. The output is always an array, even for a single input, because the same pipeline call handles batches: js const results = await classifier 'This is great ', 'Completely broken, waste of money.' ; // // { label: 'POSITIVE', score: 0.9998 }, // { label: 'NEGATIVE', score: 0.9991 } // // Full Working Example The example below is a complete, self-contained HTML file. Open it in any modern browser. The model downloads on first run and caches subsequent loads, which are instant. < DOCTYPE html <html lang="en" <head <meta charset="UTF-8" / <meta name="viewport" content="width=device-width, initial-scale=1.0" / <title Text Classification with Transformers.js</title <style body { font-family: system-ui, sans-serif; max-width: 680px; margin: 2rem auto; padding: 0 1rem; } textarea { width: 100%; height: 100px; padding: 0.5rem; font-size: 1rem; margin-bottom: 0.5rem; } button { padding: 0.5rem 1.5rem; font-size: 1rem; cursor: pointer; } button:disabled { opacity: 0.5; cursor: not-allowed; } status { color: 666; font-size: 0.9rem; margin: 0.5rem 0; } result { margin-top: 1rem; font-size: 1.1rem; font-weight: bold; } .positive { color: 16a34a; } .negative { color: dc2626; } </style </head <body <h1 Sentiment Classifier</h1 <p Runs entirely in your browser -- no server, no API calls.</p <textarea id="input" placeholder="Enter text to classify..." I really enjoyed using this product. The setup was easy and everything works perfectly. </textarea <button id="classify-btn" disabled Loading model...</button <div id="status" Downloading model on first run this may take a moment ...</div <div id="result" </div <script type="module" import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2'; const statusEl = document.getElementById 'status' ; const resultEl = document.getElementById 'result' ; const btn = document.getElementById 'classify-btn' ; const inputEl = document.getElementById 'input' ; let classifier; async function loadModel { classifier = await pipeline 'text-classification', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { dtype: 'q8', progress callback: p = { if p.status === 'progress' { const pct = Math.round p.progress ?? 0 ; statusEl.textContent = Downloading model: \${pct}% ; } } } ; btn.textContent = 'Classify'; btn.disabled = false; statusEl.textContent = 'Model loaded and cached. Subsequent loads are instant.'; } async function classify { const text = inputEl.value.trim ; if text return; btn.disabled = true; btn.textContent = 'Classifying...'; resultEl.textContent = ''; const results = await classifier text ; const { label, score } = results; const pct = score 100 .toFixed 1 ; const cssClass = label === 'POSITIVE' ? 'positive' : 'negative'; resultEl.innerHTML = <span class="\${cssClass}" \${label}</span -- \${pct}% confidence ; btn.disabled = false; btn.textContent = 'Classify'; } btn.addEventListener 'click', classify ; loadModel .catch err = { statusEl.textContent = Error loading model: \${err.message} ; } ; </script </body </html The loadModel function calls pipeline with the task name, model ID, and options. The progress callback fires repeatedly during the download and updates the status text so the user is not staring at a frozen screen. Once the model loads, the button is enabled. When the user clicks Classify, classifier text runs inference synchronously from cache, typically under 200ms on a modern laptop. The result destructures label and score from the first array element, formats the confidence as a percentage, and applies a CSS class for color coding. Task 2: Zero-Shot Classification Zero-shot classification does something regular text classification cannot: it classifies text into categories you define at runtime, with no training data required. You pass the text and a list of labels in plain English. The model decides which label fits best based on its understanding of language semantics. This is useful any time you cannot or do not want to train a model on labelled examples, which is most of the time in real projects. // How It Works Under the Hood The model reformulates each candidate label as a natural language inference NLI hypothesis. For the label " billing issue ", it generates the hypothesis " This text is about a billing issue " and computes the probability that the hypothesis is entailed by the input text. The label with the highest entailment score wins. This NLI-based approach https://huggingface.co/tasks/zero-shot-classification is why you can use any descriptive English phrase as a label and get a meaningful result. The model understands the meaning of your labels, not just their surface form. What the output looks like: js const classifier = await pipeline 'zero-shot-classification', 'Xenova/bart-large-mnli' ; const result = await classifier 'My invoice is wrong and I was charged twice.', 'billing', 'technical support', 'shipping', 'returns', 'account access' ; // { // sequence: 'My invoice is wrong and I was charged twice.', // labels: 'billing', 'returns', 'account access', 'technical support', 'shipping' , // scores: 0.871, 0.063, 0.031, 0.022, 0.013 // } The output is an object with three fields. sequence is the original input text. labels is an array of your candidate labels, sorted from highest to lowest score. scores is an array of confidence scores in the same order. The first element of both arrays is always the winning prediction. Scores across all labels sum to approximately 1 when multi label is false the default . Setting multi label: true changes the behavior: each label scores independently rather than competing, so multiple labels can all have high scores simultaneously. Use this when text plausibly belongs to several categories at once. // Full Working Example Here is your updated script block with all the HTML brackets fully escaped. You can paste this directly into your Custom HTML block in WordPress, and it will render perfectly as a code snippet. < DOCTYPE html <html lang="en" <head <meta charset="UTF-8" / <meta name="viewport" content="width=device-width, initial-scale=1.0" / <title Zero-Shot Classifier -- Support Ticket Router</title <style body { font-family: system-ui, sans-serif; max-width: 720px; margin: 2rem auto; padding: 0 1rem; } textarea { width: 100%; height: 120px; padding: 0.5rem; font-size: 1rem; } button { margin-top: 0.5rem; padding: 0.5rem 1.5rem; font-size: 1rem; cursor: pointer; } button:disabled { opacity: 0.5; cursor: not-allowed; } status { color: 666; font-size: 0.9rem; margin: 0.5rem 0; } .result-row { display: flex; justify-content: space-between; padding: 0.4rem 0; border-bottom: 1px solid eee; } .bar-container { width: 60%; background: f0f0f0; border-radius: 4px; height: 18px; } .bar { background: 2563eb; height: 100%; border-radius: 4px; transition: width 0.3s; } .label-name { min-width: 160px; font-weight: 500; } .score-text { min-width: 50px; text-align: right; color: 555; } </style </head <body <h1 Support Ticket Router</h1 <p Paste a support ticket. The model routes it to the right department with no training data needed.</p <textarea id="ticket" I placed an order three days ago but it still hasn't shipped. I have an event this weekend and really need this to arrive on time. My order number is 48821. </textarea <button id="route-btn" disabled Loading model...</button <div id="status" Downloading model on first run...</div <div id="results" </div <script type="module" import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2'; const statusEl = document.getElementById 'status' ; const resultsEl = document.getElementById 'results' ; const btn = document.getElementById 'route-btn' ; const ticketEl = document.getElementById 'ticket' ; const DEPARTMENTS = 'shipping and delivery', 'billing and payment', 'technical support', 'returns and refunds', 'account and login' ; let classifier; async function loadModel { classifier = await pipeline 'zero-shot-classification', 'Xenova/bart-large-mnli', { dtype: 'q8', progress callback: p = { if p.status === 'progress' { statusEl.textContent = Downloading model: ${Math.round p.progress ?? 0 }% ; } } } ; btn.disabled = false; btn.textContent = 'Route Ticket'; statusEl.textContent = 'Model ready.'; } async function routeTicket { const text = ticketEl.value.trim ; if text return; btn.disabled = true; btn.textContent = 'Routing...'; resultsEl.innerHTML = ''; const output = await classifier text, DEPARTMENTS, { multi label: false } ; const winner = output.labels; const confidence = output.scores 100 .toFixed 1 ; let html = <h3 Route to: <strong \${winner}</strong \${confidence}% confidence </h3 <p style="color: 666; font-size:0.9rem" Full department score breakdown:</p ; output.labels.forEach label, i = { const pct = output.scores i 100 .toFixed 1 ; const barWidth = output.scores i 100 .toFixed 0 ; html += <div class="result-row" <span class="label-name" \${label}</span <div class="bar-container" <div class="bar" style="width: \${barWidth}%" </div </div <span class="score-text" \${pct}%</span </div ; } ; resultsEl.innerHTML = html; btn.disabled = false; btn.textContent = 'Route Ticket'; } btn.addEventListener 'click', routeTicket ; loadModel .catch err = { statusEl.textContent = Error: \${err.message} ; } ; </script </body </html The DEPARTMENTS array is all the routing configuration this system needs. No training data, no labeled examples. When a ticket arrives, classifier text, DEPARTMENTS, { multi label: false } runs all five entailment checks internally and returns them ranked. The results loop builds a horizontal bar chart showing each department's score, a sorted visualization that makes it immediately obvious where the ticket should go and how confident the model was. Try changing the DEPARTMENTS array to completely different labels; the model routes correctly without any code change beyond that array. Task 3: Question Answering Question answering in Transformers.js is extractive: you provide a passage of text as context and ask a question in plain English. The model locates the span within the passage that best answers the question and returns it. It does not generate text or reason beyond what is literally in the context. The answer is always a substring of the input you provided. This makes it well-suited for document interrogation. The user provides the document; the model navigates it. What the output looks like: js const qa = await pipeline 'question-answering', 'Xenova/distilbert-base-uncased-distilled-squad' ; const result = await qa { question: 'What is the return window for electronics?', context: Our return policy allows customers to return most items within 30 days of purchase. Electronics must be returned within 15 days and must be in original packaging. Software and digital downloads are non-refundable. } ; // { // answer: '15 days', // score: 0.9823, // start: 97, // character index of answer start in context // end: 104 // character index of answer end in context // } The output has four fields. answer is the extracted substring. score is the model's confidence that this span answers the question. start and end are character indices into the original context you can use these to highlight the answer in the source text, which is valuable UX for longer documents. When the question has no clear answer in the context, score will be low and answer may be a short, seemingly random span. Treating low-confidence answers below 0.3 or 0.4 as "not found" is standard practice. // Full Working Example Here is the escaped code for your Document Q&A article block. This handles all the < and brackets inside the script and templates perfectly so it will show up cleanly on your site. < DOCTYPE html <html lang="en" <head <meta charset="UTF-8" / <meta name="viewport" content="width=device-width, initial-scale=1.0" / <title Document Q&A with Transformers.js</title <style body { font-family: system-ui, sans-serif; max-width: 720px; margin: 2rem auto; padding: 0 1rem; } label { font-weight: 600; display: block; margin-top: 1rem; } textarea { width: 100%; padding: 0.5rem; font-size: 0.95rem; } input type="text" { width: 100%; padding: 0.5rem; font-size: 0.95rem; box-sizing: border-box; } button { margin-top: 0.75rem; padding: 0.5rem 1.5rem; font-size: 1rem; cursor: pointer; } button:disabled { opacity: 0.5; cursor: not-allowed; } status { color: 666; font-size: 0.9rem; margin: 0.5rem 0; } answer-box { margin-top: 1rem; padding: 1rem; background: f8fafc; border-left: 3px solid 2563eb; } .highlight { background: fef08a; border-radius: 2px; } .confidence { color: 666; font-size: 0.85rem; margin-top: 0.5rem; } </style </head <body <h1 Document Question Answering</h1 <p Paste any document, then ask questions about it. Answers are extracted directly from the text.</p <label for="context" Document / Context</label <textarea id="context" rows="8" Acme Corp Return Policy Updated March 2025 Customers may return most standard items within 30 days of the original purchase date for a full refund. Electronics and peripherals have a shorter return window of 15 days and must be returned in original, unopened packaging to qualify. Refunds are processed within 3-5 business days after we receive the returned item. Original shipping charges are non-refundable. For items valued over $200, customers must contact support at returns@acmecorp.com before initiating a return. Software licenses and digital downloads are non-refundable under any circumstances. Gift cards cannot be returned or exchanged for cash. </textarea <label for="question" Your Question</label <input type="text" id="question" value="How long does it take to process a refund?" / <button id="ask-btn" disabled Loading model...</button <div id="status" Downloading model on first run...</div <div id="answer-box" style="display:none" </div <script type="module" import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2'; const contextEl = document.getElementById 'context' ; const questionEl = document.getElementById 'question' ; const statusEl = document.getElementById 'status' ; const answerBox = document.getElementById 'answer-box' ; const btn = document.getElementById 'ask-btn' ; const CONFIDENCE THRESHOLD = 0.1; let qaModel; async function loadModel { qaModel = await pipeline 'question-answering', 'Xenova/distilbert-base-uncased-distilled-squad', { dtype: 'q8', progress callback: p = { if p.status === 'progress' { statusEl.textContent = Downloading model: ${Math.round p.progress ?? 0 }% ; } } } ; btn.disabled = false; btn.textContent = 'Ask'; statusEl.textContent = 'Model ready.'; } async function askQuestion { const context = contextEl.value.trim ; const question = questionEl.value.trim ; if context || question return; btn.disabled = true; btn.textContent = 'Thinking...'; answerBox.style.display = 'none'; const result = await qaModel { question, context } ; answerBox.style.display = 'block'; if result.score < CONFIDENCE THRESHOLD { answerBox.innerHTML = <strong Answer not found</strong <p class="confidence" The model could not find a clear answer to this question in the provided text.</p ; } else { const before = context.slice 0, result.start ; const answer = context.slice result.start, result.end ; const after = context.slice result.end ; const highlight = \${before}<mark class="highlight" \${answer}</mark \${after} ; const confidence = result.score 100 .toFixed 1 ; answerBox.innerHTML = <strong Answer:</strong \${result.answer} <p class="confidence" Confidence: \${confidence}%</p <details style="margin-top:1rem" <summary style="cursor:pointer; color: 2563eb" Show answer highlighted in document </summary <pre style="white-space:pre-wrap; font-size:0.85rem; margin-top:0.5rem" \${highlight}</pre </details ; } btn.disabled = false; btn.textContent = 'Ask'; } btn.addEventListener 'click', askQuestion ; questionEl.addEventListener 'keydown', e = { if e.key === 'Enter' && btn.disabled askQuestion ; } ; loadModel .catch err = { statusEl.textContent = Error: \${err.message} ; } ; script </body </html The QA pipeline receives an object with question and context rather than a plain string. This is the format the task requires. The model's start and end fields are character indices into the context string, which the code uses to inject a <mark tag around the exact span the model identified. The <details element wraps the highlighted context in a collapsible section so the UI stays clean. The confidence threshold prevents low-quality extractions from appearing as confident answers; any result below 0.1 gets replaced with a "not found" message. Real-World Application: Support Ticket Router The three pipelines cover the full analytical surface of a support ticket. Sentiment tells you how the customer feels. Zero-shot classification routes the ticket to the right team. Question answering extracts the structured data you need: order number, product name, and the core issue, without parsing rules or regex. This is a complete support ticket analysis tool that combines all three. It is a single HTML file, fully self-contained, fully commented. Here is the completely escaped version of your Support Ticket Analyzer code block. All internal HTML brackets within your layout templates and script configurations have been securely converted to entities. You can drop this directly into your Custom HTML block in WordPress. < DOCTYPE html <html lang="en" <head <meta charset="UTF-8" / <meta name="viewport" content="width=device-width, initial-scale=1.0" / <title Support Ticket Analyzer</title <style { box-sizing: border-box; } body { font-family: system-ui, sans-serif; max-width: 800px; margin: 2rem auto; padding: 0 1rem; background: f9fafb; } h1 { margin-bottom: 0.25rem; } .subtitle { color: 666; margin-bottom: 1.5rem; } textarea { width: 100%; height: 130px; padding: 0.75rem; font-size: 0.95rem; border: 1px solid d1d5db; border-radius: 6px; resize: vertical; } button { padding: 0.6rem 1.8rem; font-size: 1rem; background: 2563eb; color: white; border: none; border-radius: 6px; cursor: pointer; margin-top: 0.5rem; } button:disabled { background: 93c5fd; cursor: not-allowed; } status { font-size: 0.85rem; color: 666; margin: 0.5rem 0; } .cards { display: grid; grid-template-columns: repeat 3, 1fr ; gap: 1rem; margin-top: 1.5rem; } .card { background: white; border-radius: 8px; padding: 1rem; border: 1px solid e5e7eb; } .card h3 { margin: 0 0 0.75rem; font-size: 0.9rem; text-transform: uppercase; letter-spacing: 0.05em; color: 6b7280; } .card .value { font-size: 1.15rem; font-weight: 600; } .card .sub { font-size: 0.85rem; color: 666; margin-top: 0.25rem; } .positive { color: 16a34a; } .negative { color: dc2626; } .neutral { color: d97706; } .dept-bar { display: flex; align-items: center; gap: 0.5rem; margin-top: 0.4rem; font-size: 0.85rem; } .bar-bg { flex: 1; background: f0f0f0; border-radius: 3px; height: 8px; } .bar-fill { background: 2563eb; height: 100%; border-radius: 3px; transition: width 0.4s; } .qa-item { margin-top: 0.6rem; font-size: 0.9rem; } .qa-label { font-weight: 600; color: 374151; } .qa-ans { color: 111; } .qa-low { color: 9ca3af; font-style: italic; } @media max-width: 600px { .cards { grid-template-columns: 1fr; } } </style </head <body <h1 Support Ticket Analyzer</h1 <p class="subtitle" Powered by Transformers.js -- runs entirely in your browser</p <textarea id="ticket" Hi, I ordered a laptop stand last Tuesday order 73021 but it arrived completely broken -- one of the arms snapped off right out of the box. I've been a customer for three years and this is honestly really disappointing. I need a replacement sent out as soon as possible or I'd like a full refund. Please advise. </textarea <button id="analyze-btn" disabled Loading models...</button <div id="status" Initializing -- downloading models on first run...</div <div class="cards" id="cards" style="display:none" <div class="card" id="card-sentiment" <h3 Sentiment</h3 <div class="value" id="sent-label" --</div <div class="sub" id="sent-score" --</div </div <div class="card" id="card-route" <h3 Department</h3 <div id="dept-results" </div </div <div class="card" id="card-qa" <h3 Key Info</h3 <div id="qa-results" </div </div </div <script type="module" import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2'; const ticketEl = document.getElementById 'ticket' ; const btn = document.getElementById 'analyze-btn' ; const statusEl = document.getElementById 'status' ; const cardsEl = document.getElementById 'cards' ; const DEPARTMENTS = 'returns and refunds', 'shipping and delivery', 'billing and payment', 'technical support', 'account management' ; const QA QUERIES = { label: 'Order number', question: 'What is the order number?' }, { label: 'Issue', question: 'What is the main problem or complaint?' }, { label: 'Request', question: 'What does the customer want?' } ; let sentimentPipe, zeroPipe, qaPipe; let modelsLoaded = 0; function onModelLoaded name { modelsLoaded++; statusEl.textContent = Loading models: \${modelsLoaded}/3 ready \${name} loaded ; if modelsLoaded === 3 { btn.disabled = false; btn.textContent = 'Analyze Ticket'; statusEl.textContent = 'All models ready.'; } } async function loadModels { sentimentPipe, zeroPipe, qaPipe = await Promise.all pipeline 'text-classification', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { dtype: 'q8', progress callback: p = p.status === 'done' && onModelLoaded 'Sentiment' } , pipeline 'zero-shot-classification', 'Xenova/bart-large-mnli', { dtype: 'q8', progress callback: p = p.status === 'done' && onModelLoaded 'Routing' } , pipeline 'question-answering', 'Xenova/distilbert-base-uncased-distilled-squad', { dtype: 'q8', progress callback: p = p.status === 'done' && onModelLoaded 'Q&A' } ; } async function analyzeTicket { const text = ticketEl.value.trim ; if text return; btn.disabled = true; btn.textContent = 'Analyzing...'; cardsEl.style.display = 'none'; const sentResult, zeroResult, qaResults = await Promise.all sentimentPipe text , zeroPipe text, DEPARTMENTS, { multi label: false } , Promise.all QA QUERIES.map { question } = qaPipe { question, context: text } ; const { label, score } = sentResult; const sentLabel = document.getElementById 'sent-label' ; const sentScore = document.getElementById 'sent-score' ; sentLabel.textContent = label; sentLabel.className = value \${label === 'POSITIVE' ? 'positive' : 'negative'} ; sentScore.textContent = \${ score 100 .toFixed 1 }% confidence ; if label === 'NEGATIVE' && score 0.85 { sentScore.textContent += ' -- HIGH URGENCY'; sentScore.style.color = ' dc2626'; } const deptEl = document.getElementById 'dept-results' ; deptEl.innerHTML = <div class="value" \${zeroResult.labels}</div ; zeroResult.labels.slice 0, 3 .forEach dept, i = { const pct = zeroResult.scores i 100 .toFixed 0 ; deptEl.innerHTML += <div class="dept-bar" <span style="min-width:130px" \${dept}</span <div class="bar-bg" <div class="bar-fill" style="width:\${pct}%" </div </div <span \${pct}%</span </div ; } ; const qaEl = document.getElementById 'qa-results' ; qaEl.innerHTML = ''; QA QUERIES.forEach { label: qLabel }, i = { const { answer, score: qScore } = qaResults i ; const found = qScore = 0.1; qaEl.innerHTML += <div class="qa-item" <span class="qa-label" \${qLabel}: </span <span class="\${found ? 'qa-ans' : 'qa-low'}" \${found ? answer : 'not found'} </span </div ; } ; cardsEl.style.display = 'grid'; btn.disabled = false; btn.textContent = 'Analyze Ticket'; } btn.addEventListener 'click', analyzeTicket ; loadModels .catch err = { statusEl.textContent = Error loading models: \${err.message} ; } ; </script </body </html The three pipelines load in parallel via Promise.all . This is faster than loading them sequentially because the downloads overlap. A counter tracks how many have finished, so the button only enables once all three are ready. When the user submits a ticket, all three inferences also run in parallel. The sentiment card checks whether the result is high-confidence negative and flags it as an urgent practical routing signal that requires no additional model. The department card shows the top three candidates as score bars rather than just the winner, which gives the support team enough information to override the routing if the top score is close to the second. The QA card runs three extractive queries against the ticket body and displays the results with a confidence threshold answers below 0.1 show as "not found" rather than surfacing low-quality extractions. Performance, Limitations, and When Not to Use It Transformers.js removes the server but does not eliminate trade-offs. Knowing them up front saves you from unpleasant surprises in production. Download size . The sentiment analysis pipeline downloads around 111 MB on first load https://www.raymondcamden.com/2024/12/03/using-transformersjs-for-ai-in-the-browser , not huge, but not invisible either. The zero-shot BART model is larger. For applications targeting mobile users or users on metered connections, use to cut model sizes roughly in half, and treat the model as a progressive enhancement; do not block the user interface on model load Inference speed . On a modern laptop, WASM inference for a short text classification takes 50–200ms. Zero-shot classification is slower because it runs multiple NLI passes, one per candidate label. A five-label zero-shot run typically takes 1–3 seconds on CPU. WebGPU reduces this significantly where supported Inference only . Transformers.js cannot fine-tune or train models https://transformersjs-for-developers.hashnode.dev/comprehensive-guide-to-using-transformersjs-for-developers-and-aiml-fans . If your use case requires a custom model, a classifier trained on your own labelled tickets, for example, training happens on a server Python, cloud , and the ONNX export runs in the browser Model availability . Not every model on Hugging Face Hub has an ONNX version available. To find compatible models, filter by the transformers.js library tag on the Hub https://huggingface.co/models?library=transformers.js - When to prefer a server instead: bulk processing of hundreds of texts where latency per item matters, tasks that require the largest frontier models, which are too large for browser delivery, or simple applications where the development cost of browser-based inference outweighs its benefits A quick reference for choosing dtype by context: Context | Recommended dtype | Why | |---|---|---| | Browser, general use | q8 | WASM default, good balance of size and accuracy | | Mobile or slow connection | q4 | Roughly half the file size, 1-3% accuracy cost | | Node.js server-side | fp32 | Full precision, no download size concern | | WebGPU enabled | fp16 | Fast, good quality on compatible GPU hardware | Wrapping Up Transformers.js puts production-quality NLP in the browser without a server, without an API key, and without user data leaving the device. The three pipelines in this tutorial text classification, zero-shot labelling, and question answering cover the analytical surface of a large share of real NLP use cases. The support ticket router shows how they combine into something genuinely useful in fewer than 200 lines of HTML and JavaScript. The entry point is as low as it gets: one CDN import, one await pipeline call, one inference call. Start with the simplest example in this article and run it. Modify the labels in the zero-shot demo. Point the QA model at a different document. The official Transformers.js documentation https://huggingface.co/docs/transformers.js/en/index and the examples repository https://github.com/huggingface/transformers.js-examples cover a much wider task range summarization, translation, named entity recognition, and more, all following the same pipeline pattern. is a software engineer and technical writer passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and a knack for simplifying complex concepts. You can also find Shittu on Shittu Olumide https://www.linkedin.com/in/olumide-shittu/