If you've ever tried integrating a large language model (LLM) into a Java application, you've probably written a lot of boilerplate: HTTP clients, JSON parsing, streaming handling, session management. Solon 4.0's ChatModel
abstracts all of that away with a clean, builder-oriented API.
In this guide, I'll walk through building real, working AI features using ChatModel
β from a simple chat call to a streaming chatbot with conversation memory.
ChatModel
(package org.noear.solon.ai.chat
) is a unified LLM client in Solon's AI ecosystem. Instead of writing raw HTTP calls for different model providers, you use a single API that supports:
Flux<ChatResponse>
)The best part? It uses a dialect pattern β you point it at any compatible LLM endpoint, and it adapts automatically.
Add the dependency to your pom.xml
(no parent POM needed β Solon works standalone):
<dependency>
<groupId>org.noear</groupId>
<artifactId>solon-ai</artifactId>
<version>${solon.version}</version>
</dependency>
This pulls in all built-in dialects (OpenAI, Ollama, Gemini, Anthropic, DashScope).
solon.ai.chat:
demo:
apiUrl: "http://127.0.0.1:11434/api/chat" # Full URL, not baseUrl
provider: "ollama" # Dialect identifier
model: "llama3.2" # Model name
headers:
x-demo: "demo1"
Then create a @Bean
to get a ready-to-use ChatModel
:
import org.noear.solon.ai.chat.ChatConfig;
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.annotation.Bean;
import org.noear.solon.annotation.Configuration;
import org.noear.solon.annotation.Inject;
@Configuration
public class AiConfig {
@Bean
public ChatModel chatModel(@Inject("${solon.ai.chat.demo}") ChatConfig config) {
return ChatModel.of(config).build();
}
}
Prefer code over config? Use the builder directly:
@Bean
public ChatModel chatModel() {
return ChatModel.of("http://127.0.0.1:11434/api/chat")
.standard("ollama") // or .provider("ollama") pre-4.0
.model("llama3.2")
.timeout(Duration.ofSeconds(60))
.build();
}
The standard
(or provider
) field selects the dialect:
| Standard | Example apiUrl
|
Models |
|---|---|---|
openai (default) |
https://api.openai.com/v1/chat/completions |
GPT, DeepSeek, Qwen, GLM, Kimi, etc. |
ollama |
http://127.0.0.1:11434/api/chat |
Any local Ollama model |
anthropic |
https://api.anthropic.com/v1/messages |
Claude |
gemini |
https://generativelanguage.googleapis.com/v1beta/models/... |
Gemini |
dashscope |
Aliyun DashScope endpoint | Qwen (DashScope native) |
The most basic use case β send a prompt and get a full response:
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.annotation.Inject;
import org.noear.solon.annotation.Component;
@Component
public class ChatService {
@Inject
ChatModel chatModel;
public String ask(String question) throws IOException {
ChatResponse resp = chatModel.prompt(question).call();
return resp.getMessage().getContent();
}
}
That's it. Three lines of business code.
For chatbots and assistants, streaming is essential. ChatModel
returns a Reactor Flux<ChatResponse>
:
import reactor.core.publisher.Flux;
public Flux<String> askStream(String question) throws IOException {
return chatModel.prompt(question)
.stream()
.filter(ChatResponse::hasContent) // skip empty chunks
.map(resp -> resp.getMessage().getContent());
}
You can then subscribe, or β if you're using Solon Web Reactive β return the Flux
directly to an SSE endpoint:
import org.noear.solon.web.sse.SseEvent;
import org.noear.solon.annotation.Mapping;
import reactor.core.publisher.Flux;
@Mapping("/chat/stream")
public Flux<SseEvent> chatStream(String prompt) throws IOException {
return chatModel.prompt(prompt)
.stream()
.filter(ChatResponse::hasContent)
.map(resp -> new SseEvent()
.data(resp.getMessage().getContent()));
}
The streaming protocol uses standard SSE (text/event-stream
) or x-ndjson
depending on the provider.
LLMs are stateless. To maintain conversation context, you need to pass history with each request. ChatSession
handles this automatically.
import org.noear.solon.ai.chat.ChatSession;
import org.noear.solon.ai.chat.session.InMemoryChatSession;
ChatSession session = InMemoryChatSession.builder()
.sessionId("user-123")
.maxMessages(10) // keep last 10 turns
.build();
// First turn
ChatResponse resp1 = chatModel.prompt("Hello!")
.session(session)
.call();
// Second turn β model remembers context
ChatResponse resp2 = chatModel.prompt("What did I just say?")
.session(session)
.call();
In a real web app, you'll want one session per user. Here's a controller that does exactly that:
import org.noear.solon.annotation.Controller;
import org.noear.solon.web.sse.SseEvent;
import reactor.core.publisher.Flux;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
@Controller
public class ChatController {
@Inject
ChatModel chatModel;
final Map<String, ChatSession> sessionMap = new ConcurrentHashMap<>();
@Mapping("/chat")
public Flux<SseEvent> chat(String sessionId, String prompt) throws IOException {
ChatSession session = sessionMap.computeIfAbsent(sessionId,
k -> InMemoryChatSession.builder().sessionId(k).build());
return chatModel.prompt(prompt)
.session(session)
.options(o -> o.systemPrompt("You are a helpful and friendly assistant."))
.stream()
.filter(ChatResponse::hasContent)
.map(resp -> new SseEvent().data(resp.getMessage().getContent()));
}
}
| Implementation | Storage | Use Case |
|---|---|---|
InMemoryChatSession |
||
| Local Map | Dev, single-node | |
FileChatSession |
||
| File system | CLI tools, desktop apps | |
RedisChatSession |
||
| Redis | Production, distributed |
Control model behavior per-request with ChatOptions
:
chatModel.prompt("Write a poem about Java")
.options(o -> o
.temperature(0.8)
.max_tokens(500)
.top_p(0.9)
.systemPrompt("You are a creative poet."))
.call();
Common options include:
| Method | Description |
|---|---|
temperature(val) |
|
| Sampling temperature (0.0β2.0) | |
max_tokens(val) |
|
| Max output tokens | |
top_p(val) |
|
| Nucleus sampling | |
top_k(val) |
|
| Top-K sampling | |
frequency_penalty(val) |
|
| Reduce repetition | |
presence_penalty(val) |
|
| Encourage new topics | |
tool_choice(val) |
|
Force tool use: none , auto , required , or tool name |
|
systemPrompt(val) |
|
| System message for this request | |
role(val) |
|
| Agent role (v3.9.1+) | |
instruction(val) |
|
| Agent instruction (v3.9.1+) |
Sometimes you need more than a simple string. Use Prompt
and ChatMessage
:
import org.noear.solon.ai.chat.Prompt;
import org.noear.solon.ai.chat.message.ChatMessage;
Prompt prompt = Prompt.of(
ChatMessage.ofSystem("You translate English to French."),
ChatMessage.ofUser("Hello, how are you?"),
ChatMessage.ofAssistant("Bonjour, comment allez-vous?"),
ChatMessage.ofUser("What is your name?")
);
ChatResponse resp = chatModel.prompt(prompt).call();
Let's build a simple knowledge-aware chatbot β the kind of RAG-lite pattern you see in real projects. This example uses ChatMessage.ofUserAugment()
to inject context into the prompt:
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.ai.chat.message.ChatMessage;
import org.noear.solon.annotation.Component;
import org.noear.solon.annotation.Inject;
@Component
public class KnowledgeChatbot {
@Inject
ChatModel chatModel;
public String answer(String question, String referenceContext) throws Exception {
// Augment the user message with reference context
ChatMessage augmented = ChatMessage.ofUserAugment(question, referenceContext);
ChatResponse resp = chatModel.prompt(augmented)
.options(o -> o
.temperature(0.3)
.systemPrompt("You are a knowledgeable assistant. Answer based on the provided references."))
.call();
return resp.getMessage().getContent();
}
}
This pattern β augment user input with context, then call the model β is the foundation of RAG (Retrieval-Augmented Generation) in Solon AI.
ChatModel
is just the entry point. Solon AI also offers:
@ToolMapping
methods the LLM can invokeReActAgent
and TeamAgent
for multi-step reasoningFor the full documentation, check out the official Solon AI guide:
π https://solon.noear.org/article/918 (Model construction)
π https://solon.noear.org/article/920 (API reference)
Have you tried integrating LLMs in Java? What's your biggest pain point? Let me know in the comments β I might cover it in the next post.