Solon 4.0 ChatModel: A Practical Guide to Building LLM-Powered Applications

wpnews.pro

If you've ever tried integrating a large language model (LLM) into a Java application, you've probably written a lot of boilerplate: HTTP clients, JSON parsing, streaming handling, session management. Solon 4.0's ChatModel

abstracts all of that away with a clean, builder-oriented API.

In this guide, I'll walk through building real, working AI features using ChatModel

— from a simple chat call to a streaming chatbot with conversation memory.

ChatModel

(package org.noear.solon.ai.chat

) is a unified LLM client in Solon's AI ecosystem. Instead of writing raw HTTP calls for different model providers, you use a single API that supports:

Flux<ChatResponse>

)The best part? It uses a dialect pattern — you point it at any compatible LLM endpoint, and it adapts automatically.

Add the dependency to your pom.xml

(no parent POM needed — Solon works standalone):

<dependency>
    <groupId>org.noear</groupId>
    <artifactId>solon-ai</artifactId>
    <version>${solon.version}</version>
</dependency>

This pulls in all built-in dialects (OpenAI, Ollama, Gemini, Anthropic, DashScope).

solon.ai.chat:
  demo:
    apiUrl: "http://127.0.0.1:11434/api/chat"   # Full URL, not baseUrl
    provider: "ollama"                           # Dialect identifier
    model: "llama3.2"                            # Model name
    headers:
      x-demo: "demo1"

Then create a @Bean

to get a ready-to-use ChatModel

:

import org.noear.solon.ai.chat.ChatConfig;
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.annotation.Bean;
import org.noear.solon.annotation.Configuration;
import org.noear.solon.annotation.Inject;

@Configuration
public class AiConfig {
    @Bean
    public ChatModel chatModel(@Inject("${solon.ai.chat.demo}") ChatConfig config) {
        return ChatModel.of(config).build();
    }
}

Prefer code over config? Use the builder directly:

@Bean
public ChatModel chatModel() {
    return ChatModel.of("http://127.0.0.1:11434/api/chat")
            .standard("ollama")      // or .provider("ollama") pre-4.0
            .model("llama3.2")
            .timeout(Duration.ofSeconds(60))
            .build();
}

The standard

(or provider

) field selects the dialect:

| Standard | Example apiUrl | Models | |---|---|---| openai (default) | https://api.openai.com/v1/chat/completions | GPT, DeepSeek, Qwen, GLM, Kimi, etc. | ollama | http://127.0.0.1:11434/api/chat | Any local Ollama model | anthropic | https://api.anthropic.com/v1/messages | Claude | gemini | https://generativelanguage.googleapis.com/v1beta/models/... | Gemini | dashscope | Aliyun DashScope endpoint | Qwen (DashScope native) |

The most basic use case — send a prompt and get a full response:

import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.annotation.Inject;
import org.noear.solon.annotation.Component;

@Component
public class ChatService {
    @Inject
    ChatModel chatModel;

    public String ask(String question) throws IOException {
        ChatResponse resp = chatModel.prompt(question).call();
        return resp.getMessage().getContent();
    }
}

That's it. Three lines of business code.

For chatbots and assistants, streaming is essential. ChatModel

returns a Reactor Flux<ChatResponse>

:

import reactor.core.publisher.Flux;

public Flux<String> askStream(String question) throws IOException {
    return chatModel.prompt(question)
            .stream()
            .filter(ChatResponse::hasContent)       // skip empty chunks
            .map(resp -> resp.getMessage().getContent());
}

You can then subscribe, or — if you're using Solon Web Reactive — return the Flux

directly to an SSE endpoint:

import org.noear.solon.web.sse.SseEvent;
import org.noear.solon.annotation.Mapping;
import reactor.core.publisher.Flux;

@Mapping("/chat/stream")
public Flux<SseEvent> chatStream(String prompt) throws IOException {
    return chatModel.prompt(prompt)
            .stream()
            .filter(ChatResponse::hasContent)
            .map(resp -> new SseEvent()
                    .data(resp.getMessage().getContent()));
}

The streaming protocol uses standard SSE (text/event-stream

) or x-ndjson

depending on the provider.

LLMs are stateless. To maintain conversation context, you need to pass history with each request. ChatSession

handles this automatically.

import org.noear.solon.ai.chat.ChatSession;
import org.noear.solon.ai.chat.session.InMemoryChatSession;

ChatSession session = InMemoryChatSession.builder()
        .sessionId("user-123")
        .maxMessages(10)     // keep last 10 turns
        .build();

// First turn
ChatResponse resp1 = chatModel.prompt("Hello!")
        .session(session)
        .call();

// Second turn — model remembers context
ChatResponse resp2 = chatModel.prompt("What did I just say?")
        .session(session)
        .call();

In a real web app, you'll want one session per user. Here's a controller that does exactly that:

import org.noear.solon.annotation.Controller;
import org.noear.solon.web.sse.SseEvent;
import reactor.core.publisher.Flux;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Controller
public class ChatController {
    @Inject
    ChatModel chatModel;

    final Map<String, ChatSession> sessionMap = new ConcurrentHashMap<>();

    @Mapping("/chat")
    public Flux<SseEvent> chat(String sessionId, String prompt) throws IOException {
        ChatSession session = sessionMap.computeIfAbsent(sessionId,
                k -> InMemoryChatSession.builder().sessionId(k).build());

        return chatModel.prompt(prompt)
                .session(session)
                .options(o -> o.systemPrompt("You are a helpful and friendly assistant."))
                .stream()
                .filter(ChatResponse::hasContent)
                .map(resp -> new SseEvent().data(resp.getMessage().getContent()));
    }
}

Implementation	Storage	Use Case
`InMemoryChatSession`
Local Map	Dev, single-node
`FileChatSession`
File system	CLI tools, desktop apps
`RedisChatSession`
Redis	Production, distributed

Control model behavior per-request with ChatOptions

:

chatModel.prompt("Write a poem about Java")
        .options(o -> o
            .temperature(0.8)
            .max_tokens(500)
            .top_p(0.9)
            .systemPrompt("You are a creative poet."))
        .call();

Common options include:

Method	Description
`temperature(val)`
Sampling temperature (0.0–2.0)
`max_tokens(val)`
Max output tokens
`top_p(val)`
Nucleus sampling
`top_k(val)`
Top-K sampling
`frequency_penalty(val)`
Reduce repetition
`presence_penalty(val)`
Encourage new topics
`tool_choice(val)`
Force tool use: `none` , `auto` , `required` , or tool name
`systemPrompt(val)`
System message for this request
`role(val)`
Agent role (v3.9.1+)
`instruction(val)`
Agent instruction (v3.9.1+)

Sometimes you need more than a simple string. Use Prompt

and ChatMessage

:

import org.noear.solon.ai.chat.Prompt;
import org.noear.solon.ai.chat.message.ChatMessage;

Prompt prompt = Prompt.of(
    ChatMessage.ofSystem("You translate English to French."),
    ChatMessage.ofUser("Hello, how are you?"),
    ChatMessage.ofAssistant("Bonjour, comment allez-vous?"),
    ChatMessage.ofUser("What is your name?")
);

ChatResponse resp = chatModel.prompt(prompt).call();

Let's build a simple knowledge-aware chatbot — the kind of RAG-lite pattern you see in real projects. This example uses ChatMessage.ofUserAugment()

to inject context into the prompt:

import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.ai.chat.message.ChatMessage;
import org.noear.solon.annotation.Component;
import org.noear.solon.annotation.Inject;

@Component
public class KnowledgeChatbot {
    @Inject
    ChatModel chatModel;

    public String answer(String question, String referenceContext) throws Exception {
        // Augment the user message with reference context
        ChatMessage augmented = ChatMessage.ofUserAugment(question, referenceContext);

        ChatResponse resp = chatModel.prompt(augmented)
                .options(o -> o
                    .temperature(0.3)
                    .systemPrompt("You are a knowledgeable assistant. Answer based on the provided references."))
                .call();

        return resp.getMessage().getContent();
    }
}

This pattern — augment user input with context, then call the model — is the foundation of RAG (Retrieval-Augmented Generation) in Solon AI.

ChatModel

is just the entry point. Solon AI also offers:

@ToolMapping

methods the LLM can invokeReActAgent

and TeamAgent

for multi-step reasoningFor the full documentation, check out the official Solon AI guide:

👉 https://solon.noear.org/article/918 (Model construction)

👉 https://solon.noear.org/article/920 (API reference)

Have you tried integrating LLMs in Java? What's your biggest pain point? Let me know in the comments — I might cover it in the next post.

source & further reading

dev.to — original article The Global AI Hardware Gamble: Korea $550B + Japan $6B + Qualcomm Challenges NVIDIA - What This Means for Investors and Builders Solon 4.0 ReActAgent: A Practical Guide to Building AI Agents That Think and Act HTTP 200 Is Not Enough: Define a Successful AI API Request

Solon 4.0 ChatModel: A Practical Guide to Building LLM-Powered Applications

Run your AI side-project on zahid.host