Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM Mochallama, a new Java library, enables running llama.cpp inference directly within a Java process using JDK 22's Foreign Function and Memory (FFM) API, eliminating the need for separate daemon processes or native builds. The tool ships with a CLI that includes its own JDK runtime via npm, supports tool-calling local LLMs like Qwen2.5-1.5B, and offers Spring Boot integration with OpenAI-compatible endpoints for chat completions and tool use. This approach fills a gap in JVM-based local LLM deployment by providing in-process inference without JNI's crash risks or the overhead of HTTP-based solutions like Ollama. The 10-second hook No Java install, no daemon, no native build — npx a tool-calling local LLM and start chatting: The CLI ships its own jlink JDK-22 runtime image via npm, so this needs no JDK on the host. qwen2.5-1.5b is the default tool-capable preset; the model downloads on first run into ~/.chatbot models . Embed it: the smallest plain-Java snippet Two dependencies — the Java jar plus the platform aggregator that resolves the right native classifier jar for your host: JVM flags JDK 22+ is required FFM is GA there . Run with --enable-native-access=ALL-UNNAMED . Or one Spring dependency The starter autoconfigures a local model service and the OpenAI-compatible endpoints — no spring-ai dependency required: Tell it which model to load — a Hugging Face id is the simplest it resolves + caches the GGUF on first start . In src/main/resources/application.properties : Start the app the model loads asynchronously — endpoints return 503 until state: READY , then point any OpenAI client at it. POST /v1/chat/completions handles non-streaming, stream:true SSE, and tools / tool choice ; GET /v1/models lists the loaded model. A real multi-turn CLI mochallama chat is a stateful REPL — it keeps the full conversation history, not amnesiac single turns. Sessions persist at ~/.chatbot models/sessions/