# Solon 4.0 ChatModel: A Practical Guide to Building LLM-Powered Applications

> Source: <https://dev.to/solonjava/solon-40-chatmodel-a-practical-guide-to-building-llm-powered-applications-c7e>
> Published: 2026-07-04 00:39:48+00:00

If you've ever tried integrating a large language model (LLM) into a Java application, you've probably written a lot of boilerplate: HTTP clients, JSON parsing, streaming handling, session management. Solon 4.0's `ChatModel`

abstracts all of that away with a clean, builder-oriented API.

In this guide, I'll walk through building real, working AI features using `ChatModel`

— from a simple chat call to a streaming chatbot with conversation memory.

`ChatModel`

(package `org.noear.solon.ai.chat`

) is a unified LLM client in Solon's AI ecosystem. Instead of writing raw HTTP calls for different model providers, you use a single API that supports:

`Flux<ChatResponse>`

)The best part? It uses a **dialect pattern** — you point it at any compatible LLM endpoint, and it adapts automatically.

Add the dependency to your `pom.xml`

(no parent POM needed — Solon works standalone):

```
<dependency>
    <groupId>org.noear</groupId>
    <artifactId>solon-ai</artifactId>
    <version>${solon.version}</version>
</dependency>
```

This pulls in all built-in dialects (OpenAI, Ollama, Gemini, Anthropic, DashScope).

```
solon.ai.chat:
  demo:
    apiUrl: "http://127.0.0.1:11434/api/chat"   # Full URL, not baseUrl
    provider: "ollama"                           # Dialect identifier
    model: "llama3.2"                            # Model name
    headers:
      x-demo: "demo1"
```

Then create a `@Bean`

to get a ready-to-use `ChatModel`

:

``` python
import org.noear.solon.ai.chat.ChatConfig;
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.annotation.Bean;
import org.noear.solon.annotation.Configuration;
import org.noear.solon.annotation.Inject;

@Configuration
public class AiConfig {
    @Bean
    public ChatModel chatModel(@Inject("${solon.ai.chat.demo}") ChatConfig config) {
        return ChatModel.of(config).build();
    }
}
```

Prefer code over config? Use the builder directly:

```
@Bean
public ChatModel chatModel() {
    return ChatModel.of("http://127.0.0.1:11434/api/chat")
            .standard("ollama")      // or .provider("ollama") pre-4.0
            .model("llama3.2")
            .timeout(Duration.ofSeconds(60))
            .build();
}
```

The `standard`

(or `provider`

) field selects the dialect:

| Standard | Example `apiUrl`
|
Models |
|---|---|---|
`openai` (default) |
`https://api.openai.com/v1/chat/completions` |
GPT, DeepSeek, Qwen, GLM, Kimi, etc. |
`ollama` |
`http://127.0.0.1:11434/api/chat` |
Any local Ollama model |
`anthropic` |
`https://api.anthropic.com/v1/messages` |
Claude |
`gemini` |
`https://generativelanguage.googleapis.com/v1beta/models/...` |
Gemini |
`dashscope` |
Aliyun DashScope endpoint | Qwen (DashScope native) |

The most basic use case — send a prompt and get a full response:

``` python
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.annotation.Inject;
import org.noear.solon.annotation.Component;

@Component
public class ChatService {
    @Inject
    ChatModel chatModel;

    public String ask(String question) throws IOException {
        ChatResponse resp = chatModel.prompt(question).call();
        return resp.getMessage().getContent();
    }
}
```

That's it. Three lines of business code.

For chatbots and assistants, streaming is essential. `ChatModel`

returns a Reactor `Flux<ChatResponse>`

:

``` python
import reactor.core.publisher.Flux;

public Flux<String> askStream(String question) throws IOException {
    return chatModel.prompt(question)
            .stream()
            .filter(ChatResponse::hasContent)       // skip empty chunks
            .map(resp -> resp.getMessage().getContent());
}
```

You can then subscribe, or — if you're using Solon Web Reactive — return the `Flux`

directly to an SSE endpoint:

``` python
import org.noear.solon.web.sse.SseEvent;
import org.noear.solon.annotation.Mapping;
import reactor.core.publisher.Flux;

@Mapping("/chat/stream")
public Flux<SseEvent> chatStream(String prompt) throws IOException {
    return chatModel.prompt(prompt)
            .stream()
            .filter(ChatResponse::hasContent)
            .map(resp -> new SseEvent()
                    .data(resp.getMessage().getContent()));
}
```

The streaming protocol uses standard SSE (`text/event-stream`

) or `x-ndjson`

depending on the provider.

LLMs are stateless. To maintain conversation context, you need to pass history with each request. `ChatSession`

handles this automatically.

``` python
import org.noear.solon.ai.chat.ChatSession;
import org.noear.solon.ai.chat.session.InMemoryChatSession;

ChatSession session = InMemoryChatSession.builder()
        .sessionId("user-123")
        .maxMessages(10)     // keep last 10 turns
        .build();

// First turn
ChatResponse resp1 = chatModel.prompt("Hello!")
        .session(session)
        .call();

// Second turn — model remembers context
ChatResponse resp2 = chatModel.prompt("What did I just say?")
        .session(session)
        .call();
```

In a real web app, you'll want one session per user. Here's a controller that does exactly that:

``` python
import org.noear.solon.annotation.Controller;
import org.noear.solon.web.sse.SseEvent;
import reactor.core.publisher.Flux;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Controller
public class ChatController {
    @Inject
    ChatModel chatModel;

    final Map<String, ChatSession> sessionMap = new ConcurrentHashMap<>();

    @Mapping("/chat")
    public Flux<SseEvent> chat(String sessionId, String prompt) throws IOException {
        ChatSession session = sessionMap.computeIfAbsent(sessionId,
                k -> InMemoryChatSession.builder().sessionId(k).build());

        return chatModel.prompt(prompt)
                .session(session)
                .options(o -> o.systemPrompt("You are a helpful and friendly assistant."))
                .stream()
                .filter(ChatResponse::hasContent)
                .map(resp -> new SseEvent().data(resp.getMessage().getContent()));
    }
}
```

| Implementation | Storage | Use Case |
|---|---|---|
`InMemoryChatSession` |
Local Map | Dev, single-node |
`FileChatSession` |
File system | CLI tools, desktop apps |
`RedisChatSession` |
Redis | Production, distributed |

Control model behavior per-request with `ChatOptions`

:

``` php
chatModel.prompt("Write a poem about Java")
        .options(o -> o
            .temperature(0.8)
            .max_tokens(500)
            .top_p(0.9)
            .systemPrompt("You are a creative poet."))
        .call();
```

Common options include:

| Method | Description |
|---|---|
`temperature(val)` |
Sampling temperature (0.0–2.0) |
`max_tokens(val)` |
Max output tokens |
`top_p(val)` |
Nucleus sampling |
`top_k(val)` |
Top-K sampling |
`frequency_penalty(val)` |
Reduce repetition |
`presence_penalty(val)` |
Encourage new topics |
`tool_choice(val)` |
Force tool use: `none` , `auto` , `required` , or tool name |
`systemPrompt(val)` |
System message for this request |
`role(val)` |
Agent role (v3.9.1+) |
`instruction(val)` |
Agent instruction (v3.9.1+) |

Sometimes you need more than a simple string. Use `Prompt`

and `ChatMessage`

:

``` python
import org.noear.solon.ai.chat.Prompt;
import org.noear.solon.ai.chat.message.ChatMessage;

Prompt prompt = Prompt.of(
    ChatMessage.ofSystem("You translate English to French."),
    ChatMessage.ofUser("Hello, how are you?"),
    ChatMessage.ofAssistant("Bonjour, comment allez-vous?"),
    ChatMessage.ofUser("What is your name?")
);

ChatResponse resp = chatModel.prompt(prompt).call();
```

Let's build a simple **knowledge-aware chatbot** — the kind of RAG-lite pattern you see in real projects. This example uses `ChatMessage.ofUserAugment()`

to inject context into the prompt:

``` python
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.ai.chat.message.ChatMessage;
import org.noear.solon.annotation.Component;
import org.noear.solon.annotation.Inject;

@Component
public class KnowledgeChatbot {
    @Inject
    ChatModel chatModel;

    public String answer(String question, String referenceContext) throws Exception {
        // Augment the user message with reference context
        ChatMessage augmented = ChatMessage.ofUserAugment(question, referenceContext);

        ChatResponse resp = chatModel.prompt(augmented)
                .options(o -> o
                    .temperature(0.3)
                    .systemPrompt("You are a knowledgeable assistant. Answer based on the provided references."))
                .call();

        return resp.getMessage().getContent();
    }
}
```

This pattern — augment user input with context, then call the model — is the foundation of RAG (Retrieval-Augmented Generation) in Solon AI.

`ChatModel`

is just the entry point. Solon AI also offers:

`@ToolMapping`

methods the LLM can invoke`ReActAgent`

and `TeamAgent`

for multi-step reasoningFor the full documentation, check out the official Solon AI guide:

👉 [https://solon.noear.org/article/918](https://solon.noear.org/article/918) (Model construction)

👉 [https://solon.noear.org/article/920](https://solon.noear.org/article/920) (API reference)

*Have you tried integrating LLMs in Java? What's your biggest pain point? Let me know in the comments — I might cover it in the next post.*