Real-Time Object Detection in the Browser

Roboflow has introduced a serverless streaming pipeline that enables real-time object detection in a web browser without requiring large client-side downloads or complex backend infrastructure. The system uses WebRTC to stream video frames from a browser to a cloud GPU, where a vision model processes them and returns annotated results with bounding boxes and labels. This approach allows developers to build applications like a browser-based security camera monitor that can detect and highlight people in a live webcam feed within 20 minutes.

Running high-performance computer vision on the web traditionally meant making a tough compromise: either download massive, multi-megabyte machine learning models directly into the client's browser slowing down initial load times and draining device batteries or build a complex, expensive backend server architecture to process incoming video frames. Today, achieving ultra-low latency real-time object detection in the browser is completely frictionless thanks to serverless streaming pipelines. By streaming video frames over a web connection, you can execute complex vision models in the cloud and render the results instantly on the client side. To see exactly how this works under the hood, we are going to look at a simple use case of building a lightweight browser-based security camera monitor. This application will capture a live webcam feed, process it via a cloud-hosted vision pipeline, and display live bounding boxes whenever a person enters the frame. This is all within a standard web page running in less than 20 minutes. How In-Browser Detection Works The magic behind real-time browser vision relies on a shift in how we handle video data. Traditional web applications use standard HTTP requests, which introduce too much overhead for continuous, frame-by-frame media streaming. Instead, this article leverages a modern approach: Vite v5.x & React v19.x : Provide a quick frontend environment to initialize the user's webcam, manage the application UI state, and render the final incoming video feed. The WebRTC Protocol: Rather than uploading individual images over slower protocols, the application opens an optimized. WebRTC uses UDP transport layers to stream raw video frames from your webcam to a remote cloud GPU instance with practically zero lag. WebRTC video streaming channel Roboflow Inference SDK: A specialized TypeScript library that handles the complex networking setup such as ICE configuration, connection handshakes, and session management so developers can connect browser hardware to vision models with only a few lines of code. Setting Up the Backend Vision Workflow Before writing our frontend code, we need to define the intelligence that our browser will communicate with. We can build this entire pipeline visually using Roboflow Workflows https://docs.roboflow.com/workflows?ref=blog.roboflow.com , which allows us to orchestrate complex computer vision blocks without writing backend boilerplate. Here is the simple prototype used for this article. https://app.roboflow.com/workflows/embed/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ3b3JrZmxvd0lkIjoiMUVCemJMdEw3NTFYM3k3d1JPdlciLCJ3b3Jrc3BhY2VJZCI6ImVHM1R4bXRjTUlOSFNiTXhOQVgwNUxKTEtreDEiLCJ1c2VySWQiOiJlRzNUeG10Y01JTkhTYk14TkFYMDVMSkxLa3gxIiwiaWF0IjoxNzc5NDA1MjY4fQ.S4mC-p71jJtnKx jpMOmjPkAkRznKYBB7rtGroeK-Xc?ref=blog.roboflow.com workflow For our security monitor use case, here is a breakdown of the process to create the workflow: Create the Workflow: Open the platform dashboard and enter the Workflows builder. Starting a clean development template gives you an environment to place, configure, and link your functional blocks. Object Detection Model Node: Receives the raw webcam frames from the browser and passes them through a pre-trainedmodel optimized for finding humans people-detection-o4rdr/7 . object detection Visualization Blocks: - Bounding Box Visualization: Takes the detection class and confidence score data, then overlays a text label e.g., “person” near the corresponding bounding box on the video frame. - Label Visualization: Takes the detection class and confidence score data, then overlays a text label e.g., “person” near the corresponding bounding box on the video frame. Output Node: The final annotated frames are mapped to a custom response track named label visualization output, which our frontend web app will instantly listen for and render. Test It: Check whether your workflow runs as planned and whether your model needs any improvements by clicking the run button in the top right. Code Implementation & Parameter Tuning Running real-time object detection straight in a browser tab fixes the biggest headaches of web-based AI. It skips the massive model downloads and avoids the nightmare of managing heavy backend server pipelines. Instead, WebRTC streaming hands frontend developers full control right inside JavaScript. You can easily adjust remote GPU power, route traffic to the nearest server region to kill latency, and pull a dual-stream feed annotated video and raw JSON predictions at the same time. Therefore, the true power of browser-based detection is the control you get over your streaming configuration. When you initialize your streaming session using webrtc.useStream , you can pass several specific parameter blocks to tune performance, adjust hardware allocation, and manage data feeds on the fly. You can design and style the surrounding user interface however you like. To skip the tedious setup and get a working project out of the box, you can clone the full template directly from this GitHub repository https://github.com/aarnavshah12/Real-Time-Browser-Object-Detection?ref=blog.roboflow.com . Since the rest of the project files just handle standard frontend boilerplate and styling, we are going to focus our attention entirely on the core streaming engine inside the App.jsx file. Here is the React component powering our live browser-based app: js import { useRef, useState } from "react"; import { connectors, webrtc, streams } from "@roboflow/inference-sdk"; const API KEY = import.meta.env.VITE ROBOFLOW API KEY; const WORKSPACE = "aarnavs-space"; const WORKFLOW ID = "custom-workflow-21"; export default function App { const videoRef = useRef null ; const connectionRef = useRef null ; const isLive, setIsLive = useState false ; const isConnecting, setIsConnecting = useState false ; async function start { setIsConnecting true ; try { const connector = connectors.withApiKey API KEY, { serverUrl: "/roboflow-api" } ; const stream = await streams.useCamera { video: { width: 1280, height: 720 } } ; connectionRef.current = await webrtc.useStream { source: stream, connector, wrtcParams: { workspaceName: WORKSPACE, workflowId: WORKFLOW ID, streamOutputNames: "label visualization output" , processingTimeout: 3600, requestedPlan: "webrtc-gpu-medium", requestedRegion: "us" } } ; videoRef.current.srcObject = await connectionRef.current.remoteStream ; setIsLive true ; } catch err { console.error err ; alert "Error: " + err.message ; } finally { setIsConnecting false ; } } function stop { connectionRef.current?.cleanup ; if videoRef.current videoRef.current.srcObject = null; setIsLive false ; } return <div style={{ padding: "20px", fontFamily: "Arial, sans-serif", maxWidth: "1200px", margin: "0 auto" }} <h1 Webcam Stream</h1 <div style={{ marginBottom: "20px" }} <button onClick={isLive ? stop : start} disabled={isConnecting} style={{ padding: "10px 20px", fontSize: "16px", cursor: isConnecting ? "not-allowed" : "pointer", background: isLive ? " ff4444" : " 44aa44", color: "white", border: "none", borderRadius: "4px", opacity: isConnecting ? 0.6 : 1 }} {isConnecting ? "Connecting..." : isLive ? "Stop" : "Start"} </button </div <video ref={videoRef} autoPlay playsInline muted style={{ width: "100%", maxWidth: "800px", border: "1px solid ccc", borderRadius: "4px" }} / </div ; } Setup, Environment Variables, and Refs At the top of the file, the app pulls in the necessary React hooks and core modules from the Roboflow Inference SDK. js const API KEY = import.meta.env.VITE ROBOFLOW API KEY; const WORKSPACE = "aarnavs-space"; const WORKFLOW ID = "custom-workflow-21"; - Environment Variables: Instead of hardcoding the sensitive API key, the code reads it securely from import.meta.env. This keeps your credentials out of Git history. To learn more, use this . documentation - Persistent References useRef : The app uses videoRef to attach the incoming video stream to the HTML element, and connectionRef to store the active WebRTC network connection. Using React references instead of standard state means the connection stays alive and intact without causing the page to constantly re-render. The Core Streaming Logic start The start function handles the entire pipeline initialization. It follows a sequence to connect the browser to the cloud GPU instance: js const connector = connectors.withApiKey API KEY, { serverUrl: "/roboflow-api" } ; - Bypassing CORS: The connector initializes authentication, but points the serverUrl to a local network route /roboflow-api . This routes traffic cleanly through your Vite proxy config to completely sidestep browser security blocks. js const stream = await streams.useCamera { video: { width: 1280, height: 720 } } ; - Camera Capture: The SDK automatically handles asking the user for webcam permissions and configures the raw hardware capture stream at a 720p resolution. connectionRef.current = await webrtc.useStream { ... } ; - Establishing the Stream: This opens the real-time WebRTC tunnel. It sends the raw local webcam frames directly to a serverless GPU runner, passes them through your vision workflow, and returns the processed frames. - Rendering the Output: videoRef.current.srcObject = await connectionRef.current.remoteStream ; intercepts the returned, cloud-annotated video track and binds it to your on-screen player. Choosing your Preferred Configuration The wrtcParams block inside the streaming function lets you control performance variables on the fly: streamOutputNames : Tells the SDK which visual node from your visual workflow should be encoded into the live video feed. In this code, it points to label visualization output to fetch frames that have bounding boxes and labels drawn over the detections. processingTimeout : Sets an automated guardrail here, 3600 seconds, or 1 hour . If a user leaves the browser tab running or goes idle, the remote cloud server automatically spins down to save compute resources. requestedPlan : Chooses the underlying remote hardware strength. You can easily switch between webrtc-gpu-small, webrtc-gpu-medium, or webrtc-gpu-large depending on how heavy your pipeline gets. requestedRegion : Chooses the cloud datacenter closest to your end-user us, eu, or ap to reduce the physical distance data has to travel, keeping latency minimal. Additional Things You Can Choose: While not active in this specific UI layout, you can add two parameters for deeper app integration: dataOutputNames : Lets you pass a model node name like "predictions" to stream raw, structured JSON data arrays back alongside the video. onData : A custom event listener callback that runs every time a new JSON payload hits the browser. You can use this to execute native JavaScript actions like playing an alarm ringtone when a person enters the camera view . Teardown and Cleanup If a user stops the session, the application resets gracefully: function stop { connectionRef.current?.cleanup ; if videoRef.current videoRef.current.srcObject = null; setIsLive false ; } Calling .cleanup kills the WebRTC video track, disconnects the live network channels, and immediately signals the remote serverless instance to shut down. Clearing the srcObject turns off the user's physical webcam light, ensuring privacy. How to Test Locally Getting your in-browser detection app running on your machine requires just a few standard steps: Install Project Packages: Open your terminal in your project folder and make sure all required modules are local: npm install Launch the Development Server: Boot up the local Vite engine: npm run dev Load the Application: Open your browser and navigate to the local network port provided in your terminal, typically http://localhost:5173 . Grant Hardware Permissions: Click your Start Monitor button. Your browser will show a security pop-up requesting access to your camera. Allow the permission, and your live, serverless-backed object detection app is fully operational. If you wish to learn more, check out the official Roboflow Web Inference SDK Documentation https://docs.roboflow.com/deploy/sdks/web-browser/web-inference-sdk?ref=blog.roboflow.com for full API specs, code snippets, and deployment optimization strategies. Real-Time Oject Detection in Browser Conclusion Bringing high-performance computer vision into a web application used to require complex backend servers or massive local model downloads that drained device batteries. By utilizing serverless streaming pipelines and WebRTC, you can deploy highly responsive, low-latency object detection apps directly inside a standard browser tab. Whether you are constructing a lightweight home security monitor, building custom interactive web tools, or prototyping quick vision projects, processing live video feeds on demand opens up a huge range of front-end possibilities. Head over to Roboflow https://roboflow.com/?ref=blog.roboflow.com to grab your API key and start experimenting with your own custom browser workflows today. Cite this Post Use the following entry to cite this post in your research: Aarnav Shah /author/aarnavshah/ . May 26, 2026 . Real-Time Object Detection in the Browser. Roboflow Blog: https://blog.roboflow.com/real-time-object-detection-in-the-browser/