Six Lines, Zero API Calls: Running LLMs On-Device in React Native Software Mansion's react-native-executorch library, built on Meta's ExecuTorch runtime, enables running AI models on-device in React Native without API calls or network connectivity. A developer demonstrates building a local chat screen with just six lines of model code, highlighting that the library handles model execution while developers must build the surrounding UI and logic. The library requires React Native's New Architecture, Expo SDK 54+, and custom dev builds. Every AI feature I've worked on has done the same quiet thing: collect the user's text, send it to someone else's server, pay per token, and pray the network holds. That's fine until it isn't: Your user is on a flight, with no network and a dead feature. It's a journaling app, where "we send your private thoughts to a third party" is a hard no. Finance notices the OpenAI bill climbing in a straight line with usage. There's another option most React Native devs still treat as exotic: run the model on the device. No API call, no network, no per-token cost. The first time I wired this into an offline text-enhancement tool with Expo, the surprise wasn't that it worked. It's that the actual model code was about six lines. The hard parts were everywhere except the model. This is a walkthrough of react-native-executorch by Software Mansion, the Reanimated and Gesture Handler folks , built on Meta's ExecuTorch runtime. We'll build a working local chat screen, but more importantly I'll show you the traps. The ones that cost me an afternoon each. The ones an AI-generated tutorial will confidently get wrong because the API changed underneath it. React Native ExecuTorch provides a declarative way to run AI models on-device using React Native, powered by ExecuTorch 🚀. It offers out-of-the-box support for a wide range of LLMs, computer vision models, and more. Visit our HuggingFace https://huggingface.co/software-mansion page to explore these models. ExecuTorch https://executorch.ai , developed by Meta, is a novel framework allowing AI model execution on devices like mobile phones or microcontrollers. React Native ExecuTorch bridges the gap between React Native and native platform capabilities, enabling developers to efficiently run local AI models on mobile devices. This can be achieved without the need for extensive expertise in native programming or machine learning. The minimal supported version are: Before any code, one idea that saves a lot of confusion: the LLM is one part of your app, not the app itself. The library gives you the model and nothing else. Everything around it is still your job. It helps to picture three things working together: Your normal app code is predictable. Same input, same output, every time: the buttons, the list, the navigation. The model is not. Give it the same prompt twice and you'll get slightly different answers, because it generates text by predicting likely next words, not by looking facts up. That's also why it sometimes states wrong things with total confidence. It isn't a defect you can patch, it's how the thing works, so plan for it. The person reading the output is the final check. They decide what to trust and what to ignore. react-native-executorch owns only the middle piece. It hands you a stream of words and a few status flags. It does not manage your chat UI, decide when to run the model, or judge whether the answer is any good. Those are yours to build. New Architecture only. The library does not support the old RN architecture. If your app is still on it, that's your first migration. Expo SDK 54+ if you're on Expo which I'd recommend . Older SDKs break on the file-system APIs the library now depends on. A custom dev build, not Expo Go. This relies on native modules. Expo Go will not load it. This trips up everyone the first time. A real iOS device for release builds. Because ExecuTorch runs natively, you can't produce an iOS release build targeting the simulator. Debug on the sim is fine; release testing needs hardware. That last pair isn't optional advice, it's the difference between "why won't this run" and a working build. Write them on a sticky note. Installation is two steps: install the core package, then add a resource fetcher adapter. npm install react-native-executorch Then a resource fetcher adapter . These are platform-specific, so install the one that matches your project. Expo projects npm install react-native-executorch-expo-resource-fetcher expo-file-system expo-asset Bare React Native npm install react-native-executorch-bare-resource-fetcher @dr.pogodin/react-native-fs @kesha-antonov/react-native-background-downloader Before you call any other API , you must initialize ExecuTorch with that adapter, once, at your app's entry point: js // App.tsx or index.js , top level, runs once import { initExecutorch } from "react-native-executorch"; import { ExpoResourceFetcher } from "react-native-executorch-expo-resource-fetcher"; initExecutorch { resourceFetcher: ExpoResourceFetcher } ; Skip this and the first model you load throws ResourceFetcherAdapterNotInitialized . It's the most common setup mistake, and an easy one to miss because initExecutorch lives at your entry point, far from where you actually call useLLM . One more, if you plan to bundle a model with the app via require instead of downloading it. Add the binary extensions to Metro: // metro.config.js defaultConfig.resolver.assetExts.push "pte" ; // exported model defaultConfig.resolver.assetExts.push "bin" ; // tokenizer Here's the whole "load an LLM" surface: js import { models, useLLM } from "react-native-executorch"; function Chat { const llm = useLLM { model: models.llm.lfm2 5 1 2b instruct } ; // ... } models.llm. is a factory of pre-exported, ready-to-run models. One factory call gives the runtime everything it needs, already bundled: .pte format, already converted Software Mansion hosts the full lineup on HuggingFace https://huggingface.co/software-mansion , so you point at a model and the library handles fetching and wiring up the rest. No manual file juggling. I'm using LFM2.5 1.2B here because it's the library's own default and small enough to behave on mid-range hardware. You've got real choices though. The bundled lineup includes: Text models: Qwen 3 0.6B / 1.7B / 4B , Llama 3.2 1B / 3B , Phi 4 Mini, SmolLM 2, Hammer 2.1 Vision-capable: Gemma 4 and LFM2.5-VL Why I'd start small: a 4B model is noticeably smarter and noticeably more likely to crash with an out-of-memory error on a budget Android. Pick the smallest model that clears your quality bar, then size up only if you must. The hook gives you state to drive your UI: llm.downloadProgress : 0 to 1 while the model downloads on first launch llm.isReady : flips true when it's loaded and usable llm.error : populated if anything blows up llm.isGenerating : true while tokens are streaming llm.response : the generated text, updated token by token There are two ways to use this hook, and the docs name them well: functional vs managed . The distinction matters, so don't skim it. You pass the full message array every time, you keep the history, you get a token stream back. Nothing is remembered for you. js import { models, useLLM, type Message } from "react-native-executorch"; import { View, Text, Button } from "react-native"; function Chat { const llm = useLLM { model: models.llm.lfm2 5 1 2b instruct } ; const handleGenerate = async = { const chat: Message = { role: "system", content: "You are a concise, helpful assistant." }, { role: "user", content: "Explain a closure in one sentence." }, ; // resolves to the full string; llm.response updates live as it streams const final = await llm.generate chat ; console.log "done:", final ; }; if llm.isReady { return