Language integrated LLMs as an OCaml function Developer Anil Madhavapeddy released ocaml-deepseek, an OCaml library that integrates DeepSeek's open-weight LLM directly into applications via a native inference engine, enabling local, dependency-free AI agents. The package, submitted to the opam package manager, allows OCaml functions to be used as tools for the model, demonstrated with a web traffic analysis agent. Fable cut out on me https://www.theverge.com/ai-artificial-intelligence/949553/anthropic-fable-5-mythos-5-government-national-security at 1am on Saturday while I was sweeping over the OCaml runtime looking for concurrency bugs. There have been excellent takes https://x.com/rosstaylor90/status/2066067747738431504 on the sovereignity implications of this, and I figured I'd roll my sleeves up and get serious about using the open weights models. DeepSeek's models have been getting more capable since their first release /notes/deepseek-r1-advances , and v4 Flash https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash is small enough to run on my Mac admittedly, very high-end Macs with 128GB/512GB of RAM respectively for my laptop and desktop . The question is whether the agentic CLIs I've been using /notes/aoah-2025 can be easily replaced. The best way to learn how a system works is to build it unikernel style /projects/unikernels , and so I aimed to expose the LLM as a normal OCaml library. This avoids routing via bloated CLIs https://github.com/anthropics/claude-code/issues/8382 , and lets the linking application drive the agentic loop according to its specific needs. What makes this practical is Antirez https://github.com/antirez ' Dwarfstar https://github.com/antirez/ds4 , a self-contained native inference engine that supports Apple Metal https://developer.apple.com/metal/ and portable ish C. I bound this directly to OCaml 5 and Eio https://github.com/ocaml-multicore/eio as ocaml-deepseek , and now a plain function call on my laptop gets me an LLM in my application. For example, I can now embed Deepseek inference directly into the OCaml webserver that drives this very site in order to look for suspicious bot activity, and because it's open weights and running locally, there's no dependency on external services A traffic-triage agent in-process in OCaml. The agent is handed two OCaml function tools and works out for itself how to combine them. let agent = Agent.create engine ~system:"You are a web-traffic analyst." ~tools: Toolbox.read ~dir:logs; read-only sandboxed handle to the logs dir query db ~conn; a SELECT-only tool over the local database in Agent.send agent ~on event "Cross-reference today's 404 spikes in the access log against the \ client IPs in the requests table. Anything coordinated indicating a bad bot?" The log reader and the database query are just two OCaml functions the model is allowed to call, each scoped and sandboxed using Eio to exactly what it needs. The model decides when and how to combine them. 1 trying-out-humpty-the-ocaml-agent Trying out Humpty the OCaml agent I've submitted https://github.com/ocaml/opam-repository/pull/30053 the package to opam, so opam install deepseek or opam pin add deepseek https://tangled.org/anil.recoil.org/ocaml-deepseek.git should work. The package also ships a binary called humpty 1 with two variants: humpty-metal for Apple Silicon and a portable humpty-cpu that should run anywhere slowly .There are four subcommands that we'll use to explain how to build an agent up in OCaml: first list choose-the-right-deepseek-model the available models and one, then grab-the-deepseek-model-weights download with it statelessly, and then wrap that into an an-llm-is-a-stateless-request-reply-function chat . adding-state-to-make-an-agentic-ocaml-library agent 1.1 choose-the-right-deepseek-model Choose the right Deepseek model Before we can get started you'll first need the open model weights https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash downloaded. humpty list https://tangled.org/anil.recoil.org/ocaml-deepseek/blob/1edcdad29a19924f988c7adef4343cb189dcb4a2/bin/humpty.ml L101-134 prints a the catalogue of available weights: bash $ humpty-metal list Models download dir: /Users/avsm/.local/share/ds4 TARGET ALIASES DESCRIPTION q2-imatrix q2 2-bit Flash routed experts ~81 GB ; for 96-128 GB RAM. q2-q4-imatrix q2q4 Mixed Flash quant ~98 GB ; higher quality for 128 GB. q4-imatrix q4 4-bit Flash routed experts ~153 GB ; for 256 GB+ RAM. pro-q2-imatrix pro-q2 PRO q2 single file ~430 GB ; for 512 GB RAM. = present, = not downloaded Pick one based on how much RAM you have; I use q2q4 on my laptop with 128GB RAM , and the extremely beefy pro-q2 on my Mac Studio with 512GB RAM . There are also split files for running the model distributed across several machines, which I'll skip here for now. 1.2 grab-the-deepseek-model-weights Grab the Deepseek model weights Once you've chosen, humpty download q4 https://tangled.org/anil.recoil.org/ocaml-deepseek/blob/1edcdad29a19924f988c7adef4343cb189dcb4a2/bin/humpty.ml L66-97 or pro-q2 , or whichever shells out to the Hugging Face CLI to fetch the GGUF. You'll either need the Huggingface CLI installed https://huggingface.co/docs/huggingface hub/en/guides/cli or have uvx https://github.com/astral-sh/uv in your path. Once this gets doing go have a cup of tea while the gigabytes of LLM weights download, and then we'll start to build an agent from the camel up 2 building-an-agent-from-the-ground-up Building an agent from the ground up I first want to pin down what an "agent" actually means, as the term seems to have accreted much mystique this year. The whole OCaml Deepseek stack is a small library you can read through quickly, so let me build an agent up from scratch. The code below links to the Tangled source https://tangled.org/anil.recoil.org/ocaml-deepseek/commit/1edcdad29a19924f988c7adef4343cb189dcb4a2 . 2.1 an-llm-is-a-stateless-request-reply-function An LLM is a stateless request-reply function A basic LLM takes in a text prompt, performs inference on some weights, and generates a text reply back. To illustrate this in our OCaml code, we need to load the model weights and spin up an engine with a cache directory for the compiled Metal kernels if using the Apple GPU version : js let engine = Deepseek.V4.create ~cache ~model ~domain mgr ~sw in V4.generate engine "Explain monads in one sentence." ~on token:print string; - : unit = Monads are a design pattern that allows you to chain operations together while automatically handling extra behavior like error handling, state, or side effects, by wrapping values in a context and providing a way to transform and combine them. Deepseek.V4.create https://tangled.org/anil.recoil.org/ocaml-deepseek/blob/1edcdad29a19924f988c7adef4343cb189dcb4a2/lib/v4.mli L41-47 opens the GGUF https://huggingface.co/docs/hub/gguf model file and, the first time it runs, materialises the embedded Metal shaders in the cache . Generating a reply is then a single call to V4.generate that encodes the supplied prompt, runs a prefill, and samples one token at a time into the on token callback until the end-of-sequence marker. All the inference is done in the Metal library in a separate OCaml domain, so we can continue to use other Eio fibres in our main application.You can try this single-shot request/response using humpty chat https://tangled.org/anil.recoil.org/ocaml-deepseek/blob/1edcdad29a19924f988c7adef4343cb189dcb4a2/bin/humpty.ml L44-62 , which keeps no memory between runs and can't take any action beyond showing the reply. bash $ humpty-metal chat 'Explain algebraic effects in OCaml 5 in one sentence' Algebraic effects in OCaml 5 allow functions to suspend execution and invoke user-defined handlers for operations like state, exceptions, or generators via a lightweight, type-safe mechanism that integrates with the language's existing type system and is used primarily for effectful programming, such as with the new Effect library for handling delimited continuations. A "conversation" is therefore just a list of role-tagged messages e.g. system, user, assistant, tool that we concatenate in our library into a prompt string for the LLM. 3 how-the-stateless-llm-asks-for-effects-to-the-external-world How the stateless LLM asks for effects to the external world The single-step text-to-text function from earlier emits text in an agreed "shape" so we can figure out what to do next based on its output. DeepSeek has trained their model to understand a little markup language called DSML https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/encoding/encoding dsv4.py , which looks something like this: <|DSML|tool calls <|DSML|invoke name="edit" <|DSML|parameter name="path" string="true" /tmp/x.c