Most AI chatbot tutorials reach for Python. FastAPI, LangChain, a quick requests.post
— done in 20 minutes. And that's fine for prototyping. But when I wanted to build something I'd actually put behind a real API — something with proper async concurrency, typed errors, and zero GC s — I reached for Rust instead.
This is a writeup of ** chatbot**, a production-oriented Rust backend that unifies Claude, OpenAI, and Ollama behind a single interface — with a Web UI, CLI mode, and Docker support baked in.
It's a fair question. LLM API calls are network-bound, so why does the backend language even matter?
A few reasons:
For LLM apps specifically: yes, 95% of your wall-clock time is waiting for the model to respond. But the other 5% — routing, state management, provider selection, connection handling — is all yours to control. Rust makes that part bulletproof.
http://localhost:8080
by default, or runs in
User (Browser or CLI)
│
▼
Axum HTTP Server (web.rs)
│
├──▶ Conversation State (Arc<Mutex<Vec<Message>>>)
│
└──▶ Runtime Config (config.rs)
│
▼
ChatClient (client.rs)
│
┌────────┼────────┐
▼ ▼ ▼
Claude OpenAI Ollama
API API API
The project has a clean five-module layout in src/
:
src/
├── main.rs # Startup routing + CLI loop
├── config.rs # Provider enum + env/runtime config
├── client.rs # Provider-specific HTTP clients
├── conversation.rs # In-memory chat state model
└── web.rs # Axum routes, connect flow, chat API
Each module has exactly one responsibility. No god objects, no tangled imports.
The heart of the project is client.rs
. Instead of sprinkling provider-specific logic everywhere, all outbound AI calls go through a single ChatClient
that dispatches based on the active provider.
The key insight: Claude uses the Anthropic native API format, while OpenAI and Ollama both speak the OpenAI-compatible schema. Separating these two code paths keeps the provider logic honest — you're not faking compatibility where there isn't any.
// Simplified concept from client.rs
pub enum Provider {
Claude,
OpenAI,
Ollama,
}
pub struct ChatClient {
pub provider: Provider,
pub model: String,
pub base_url: String,
pub api_key: Option<String>,
pub max_tokens: u32,
pub system_prompt: String,
pub http: reqwest::Client,
}
When you send a message, the client picks the right HTTP contract:
pub async fn send(&self, messages: &[Message]) -> Result<String> {
match self.provider {
Provider::Claude => self.send_claude(messages).await,
Provider::OpenAI | Provider::Ollama => self.send_openai_compat(messages).await,
}
}
This means adding a new provider (Gemini, Cohere, etc.) in the future is a matter of adding one arm and one method — the rest of the application stays untouched.
Multi-turn chat requires persistent message history. In Rust's async model, sharing state across request handlers requires explicit synchronization. The project does this with Arc<Mutex<...>>
, Rust's standard pattern for shared mutable state:
// conversation.rs - shared across all handlers
pub type SharedConversation = Arc<Mutex<Vec<Message>>>;
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Message {
pub role: String, // "user" or "assistant"
pub content: String,
}
The Arc
makes the conversation cloneable across Axum handlers (each handler runs in its own async task), and the Mutex
ensures only one handler touches the history at a time. No race conditions, guaranteed by the type system.
The app launches in Web UI mode by default, but also supports a terminal workflow:
cargo run
cargo run -- cli
PORT=3000 cargo run -- web
This is useful in different contexts — the Web UI for demos and sharing, the CLI for scripting and piping into other tools.
Download the prebuilt Windows executable (v1.0.1) and run it:
.\chatbot.exe
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
git clone https://github.com/MihirMohapatra/chatbot.git
cd chatbot
cp .env.example .env
cargo run
Set your provider in .env
:
PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
PROVIDER=openai
OPENAI_API_KEY=sk-...
ollama pull llama3.2
PROVIDER=ollama
MODEL=llama3.2
PROVIDER=openai
OPENAI_API_KEY=sk-or-...
BASE_URL=https://openrouter.ai/api
MODEL=anthropic/claude-sonnet-4
All environment variables:
| Variable | Default | Description |
|---|---|---|
PROVIDER |
||
claude |
||
claude , openai , ollama |
||
ANTHROPIC_API_KEY |
||
| — | Required for Claude | |
OPENAI_API_KEY |
||
| — | Required for OpenAI | |
MODEL |
||
| provider default | Override the model name | |
BASE_URL |
||
| provider default | Override the API endpoint | |
MAX_TOKENS |
||
1024 |
||
| Response token cap | ||
SYSTEM_PROMPT |
||
| built-in | Custom assistant behavior |
docker build -t chatbot .
docker run -it --rm -v .env:/data/.env chatbot
For Ollama with host networking:
docker run -it --rm --network host -v .env:/data/.env chatbot
For backend workloads like this (concurrent HTTP + JSON + state management), Rust consistently outperforms Python across the metrics that matter in production:
| Metric | Rust | Python | Why It Matters |
|---|---|---|---|
| Throughput (req/sec) | Higher | Lower | More concurrent users per instance |
| P95/P99 latency | Lower under load | Higher under load | More stable response times |
| Memory per worker | Lower | Higher | Better infra cost and density |
| CPU efficiency | Higher | Lower | More headroom before scaling out |
Note: For LLM apps, model/API network time dominates total latency. But Rust still wins on concurrency behavior, memory footprint, and server efficiency — which directly impacts cost and reliability at scale.
tracing
crate + OpenTelemetry integrationThe project is open source and MIT licensed:
👉 github.com/MihirMohapatra/chatbot
If you're exploring Rust for backend systems, or building something that needs to talk to multiple AI providers without writing boilerplate for each one, this is a good starting point. Issues and PRs welcome.
Built with Rust 1.80+, Tokio, Axum, reqwest, serde, and anyhow.