{"slug": "solon-4-0-chatmodel-a-practical-guide-to-building-llm-powered-applications", "title": "Solon 4.0 ChatModel: A Practical Guide to Building LLM-Powered Applications", "summary": "Solon 4.0 introduces ChatModel, a unified LLM client that abstracts away boilerplate code for integrating large language models into Java applications. The API supports multiple model providers including OpenAI, Ollama, Gemini, Anthropic, and DashScope through a dialect pattern, and offers both synchronous and streaming chat capabilities with a builder-oriented API.", "body_md": "If you've ever tried integrating a large language model (LLM) into a Java application, you've probably written a lot of boilerplate: HTTP clients, JSON parsing, streaming handling, session management. Solon 4.0's `ChatModel`\n\nabstracts all of that away with a clean, builder-oriented API.\n\nIn this guide, I'll walk through building real, working AI features using `ChatModel`\n\n— from a simple chat call to a streaming chatbot with conversation memory.\n\n`ChatModel`\n\n(package `org.noear.solon.ai.chat`\n\n) is a unified LLM client in Solon's AI ecosystem. Instead of writing raw HTTP calls for different model providers, you use a single API that supports:\n\n`Flux<ChatResponse>`\n\n)The best part? It uses a **dialect pattern** — you point it at any compatible LLM endpoint, and it adapts automatically.\n\nAdd the dependency to your `pom.xml`\n\n(no parent POM needed — Solon works standalone):\n\n```\n<dependency>\n    <groupId>org.noear</groupId>\n    <artifactId>solon-ai</artifactId>\n    <version>${solon.version}</version>\n</dependency>\n```\n\nThis pulls in all built-in dialects (OpenAI, Ollama, Gemini, Anthropic, DashScope).\n\n```\nsolon.ai.chat:\n  demo:\n    apiUrl: \"http://127.0.0.1:11434/api/chat\"   # Full URL, not baseUrl\n    provider: \"ollama\"                           # Dialect identifier\n    model: \"llama3.2\"                            # Model name\n    headers:\n      x-demo: \"demo1\"\n```\n\nThen create a `@Bean`\n\nto get a ready-to-use `ChatModel`\n\n:\n\n``` python\nimport org.noear.solon.ai.chat.ChatConfig;\nimport org.noear.solon.ai.chat.ChatModel;\nimport org.noear.solon.annotation.Bean;\nimport org.noear.solon.annotation.Configuration;\nimport org.noear.solon.annotation.Inject;\n\n@Configuration\npublic class AiConfig {\n    @Bean\n    public ChatModel chatModel(@Inject(\"${solon.ai.chat.demo}\") ChatConfig config) {\n        return ChatModel.of(config).build();\n    }\n}\n```\n\nPrefer code over config? Use the builder directly:\n\n```\n@Bean\npublic ChatModel chatModel() {\n    return ChatModel.of(\"http://127.0.0.1:11434/api/chat\")\n            .standard(\"ollama\")      // or .provider(\"ollama\") pre-4.0\n            .model(\"llama3.2\")\n            .timeout(Duration.ofSeconds(60))\n            .build();\n}\n```\n\nThe `standard`\n\n(or `provider`\n\n) field selects the dialect:\n\n| Standard | Example `apiUrl`\n|\nModels |\n|---|---|---|\n`openai` (default) |\n`https://api.openai.com/v1/chat/completions` |\nGPT, DeepSeek, Qwen, GLM, Kimi, etc. |\n`ollama` |\n`http://127.0.0.1:11434/api/chat` |\nAny local Ollama model |\n`anthropic` |\n`https://api.anthropic.com/v1/messages` |\nClaude |\n`gemini` |\n`https://generativelanguage.googleapis.com/v1beta/models/...` |\nGemini |\n`dashscope` |\nAliyun DashScope endpoint | Qwen (DashScope native) |\n\nThe most basic use case — send a prompt and get a full response:\n\n``` python\nimport org.noear.solon.ai.chat.ChatModel;\nimport org.noear.solon.ai.chat.ChatResponse;\nimport org.noear.solon.annotation.Inject;\nimport org.noear.solon.annotation.Component;\n\n@Component\npublic class ChatService {\n    @Inject\n    ChatModel chatModel;\n\n    public String ask(String question) throws IOException {\n        ChatResponse resp = chatModel.prompt(question).call();\n        return resp.getMessage().getContent();\n    }\n}\n```\n\nThat's it. Three lines of business code.\n\nFor chatbots and assistants, streaming is essential. `ChatModel`\n\nreturns a Reactor `Flux<ChatResponse>`\n\n:\n\n``` python\nimport reactor.core.publisher.Flux;\n\npublic Flux<String> askStream(String question) throws IOException {\n    return chatModel.prompt(question)\n            .stream()\n            .filter(ChatResponse::hasContent)       // skip empty chunks\n            .map(resp -> resp.getMessage().getContent());\n}\n```\n\nYou can then subscribe, or — if you're using Solon Web Reactive — return the `Flux`\n\ndirectly to an SSE endpoint:\n\n``` python\nimport org.noear.solon.web.sse.SseEvent;\nimport org.noear.solon.annotation.Mapping;\nimport reactor.core.publisher.Flux;\n\n@Mapping(\"/chat/stream\")\npublic Flux<SseEvent> chatStream(String prompt) throws IOException {\n    return chatModel.prompt(prompt)\n            .stream()\n            .filter(ChatResponse::hasContent)\n            .map(resp -> new SseEvent()\n                    .data(resp.getMessage().getContent()));\n}\n```\n\nThe streaming protocol uses standard SSE (`text/event-stream`\n\n) or `x-ndjson`\n\ndepending on the provider.\n\nLLMs are stateless. To maintain conversation context, you need to pass history with each request. `ChatSession`\n\nhandles this automatically.\n\n``` python\nimport org.noear.solon.ai.chat.ChatSession;\nimport org.noear.solon.ai.chat.session.InMemoryChatSession;\n\nChatSession session = InMemoryChatSession.builder()\n        .sessionId(\"user-123\")\n        .maxMessages(10)     // keep last 10 turns\n        .build();\n\n// First turn\nChatResponse resp1 = chatModel.prompt(\"Hello!\")\n        .session(session)\n        .call();\n\n// Second turn — model remembers context\nChatResponse resp2 = chatModel.prompt(\"What did I just say?\")\n        .session(session)\n        .call();\n```\n\nIn a real web app, you'll want one session per user. Here's a controller that does exactly that:\n\n``` python\nimport org.noear.solon.annotation.Controller;\nimport org.noear.solon.web.sse.SseEvent;\nimport reactor.core.publisher.Flux;\nimport java.util.Map;\nimport java.util.concurrent.ConcurrentHashMap;\n\n@Controller\npublic class ChatController {\n    @Inject\n    ChatModel chatModel;\n\n    final Map<String, ChatSession> sessionMap = new ConcurrentHashMap<>();\n\n    @Mapping(\"/chat\")\n    public Flux<SseEvent> chat(String sessionId, String prompt) throws IOException {\n        ChatSession session = sessionMap.computeIfAbsent(sessionId,\n                k -> InMemoryChatSession.builder().sessionId(k).build());\n\n        return chatModel.prompt(prompt)\n                .session(session)\n                .options(o -> o.systemPrompt(\"You are a helpful and friendly assistant.\"))\n                .stream()\n                .filter(ChatResponse::hasContent)\n                .map(resp -> new SseEvent().data(resp.getMessage().getContent()));\n    }\n}\n```\n\n| Implementation | Storage | Use Case |\n|---|---|---|\n`InMemoryChatSession` |\nLocal Map | Dev, single-node |\n`FileChatSession` |\nFile system | CLI tools, desktop apps |\n`RedisChatSession` |\nRedis | Production, distributed |\n\nControl model behavior per-request with `ChatOptions`\n\n:\n\n``` php\nchatModel.prompt(\"Write a poem about Java\")\n        .options(o -> o\n            .temperature(0.8)\n            .max_tokens(500)\n            .top_p(0.9)\n            .systemPrompt(\"You are a creative poet.\"))\n        .call();\n```\n\nCommon options include:\n\n| Method | Description |\n|---|---|\n`temperature(val)` |\nSampling temperature (0.0–2.0) |\n`max_tokens(val)` |\nMax output tokens |\n`top_p(val)` |\nNucleus sampling |\n`top_k(val)` |\nTop-K sampling |\n`frequency_penalty(val)` |\nReduce repetition |\n`presence_penalty(val)` |\nEncourage new topics |\n`tool_choice(val)` |\nForce tool use: `none` , `auto` , `required` , or tool name |\n`systemPrompt(val)` |\nSystem message for this request |\n`role(val)` |\nAgent role (v3.9.1+) |\n`instruction(val)` |\nAgent instruction (v3.9.1+) |\n\nSometimes you need more than a simple string. Use `Prompt`\n\nand `ChatMessage`\n\n:\n\n``` python\nimport org.noear.solon.ai.chat.Prompt;\nimport org.noear.solon.ai.chat.message.ChatMessage;\n\nPrompt prompt = Prompt.of(\n    ChatMessage.ofSystem(\"You translate English to French.\"),\n    ChatMessage.ofUser(\"Hello, how are you?\"),\n    ChatMessage.ofAssistant(\"Bonjour, comment allez-vous?\"),\n    ChatMessage.ofUser(\"What is your name?\")\n);\n\nChatResponse resp = chatModel.prompt(prompt).call();\n```\n\nLet's build a simple **knowledge-aware chatbot** — the kind of RAG-lite pattern you see in real projects. This example uses `ChatMessage.ofUserAugment()`\n\nto inject context into the prompt:\n\n``` python\nimport org.noear.solon.ai.chat.ChatModel;\nimport org.noear.solon.ai.chat.ChatResponse;\nimport org.noear.solon.ai.chat.message.ChatMessage;\nimport org.noear.solon.annotation.Component;\nimport org.noear.solon.annotation.Inject;\n\n@Component\npublic class KnowledgeChatbot {\n    @Inject\n    ChatModel chatModel;\n\n    public String answer(String question, String referenceContext) throws Exception {\n        // Augment the user message with reference context\n        ChatMessage augmented = ChatMessage.ofUserAugment(question, referenceContext);\n\n        ChatResponse resp = chatModel.prompt(augmented)\n                .options(o -> o\n                    .temperature(0.3)\n                    .systemPrompt(\"You are a knowledgeable assistant. Answer based on the provided references.\"))\n                .call();\n\n        return resp.getMessage().getContent();\n    }\n}\n```\n\nThis pattern — augment user input with context, then call the model — is the foundation of RAG (Retrieval-Augmented Generation) in Solon AI.\n\n`ChatModel`\n\nis just the entry point. Solon AI also offers:\n\n`@ToolMapping`\n\nmethods the LLM can invoke`ReActAgent`\n\nand `TeamAgent`\n\nfor multi-step reasoningFor the full documentation, check out the official Solon AI guide:\n\n👉 [https://solon.noear.org/article/918](https://solon.noear.org/article/918) (Model construction)\n\n👉 [https://solon.noear.org/article/920](https://solon.noear.org/article/920) (API reference)\n\n*Have you tried integrating LLMs in Java? What's your biggest pain point? Let me know in the comments — I might cover it in the next post.*", "url": "https://wpnews.pro/news/solon-4-0-chatmodel-a-practical-guide-to-building-llm-powered-applications", "canonical_source": "https://dev.to/solonjava/solon-40-chatmodel-a-practical-guide-to-building-llm-powered-applications-c7e", "published_at": "2026-07-04 00:39:48+00:00", "updated_at": "2026-07-04 01:18:54.833578+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-tools"], "entities": ["Solon", "ChatModel", "OpenAI", "Ollama", "Gemini", "Anthropic", "DashScope"], "alternates": {"html": "https://wpnews.pro/news/solon-4-0-chatmodel-a-practical-guide-to-building-llm-powered-applications", "markdown": "https://wpnews.pro/news/solon-4-0-chatmodel-a-practical-guide-to-building-llm-powered-applications.md", "text": "https://wpnews.pro/news/solon-4-0-chatmodel-a-practical-guide-to-building-llm-powered-applications.txt", "jsonld": "https://wpnews.pro/news/solon-4-0-chatmodel-a-practical-guide-to-building-llm-powered-applications.jsonld"}}