I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How

wpnews.pro

Most AI chatbot tutorials reach for Python. FastAPI, LangChain, a quick requests.post

— done in 20 minutes. And that's fine for prototyping. But when I wanted to build something I'd actually put behind a real API — something with proper async concurrency, typed errors, and zero GC s — I reached for Rust instead.

This is a writeup of ** chatbot**, a production-oriented Rust backend that unifies Claude, OpenAI, and Ollama behind a single interface — with a Web UI, CLI mode, and Docker support baked in.

It's a fair question. LLM API calls are network-bound, so why does the backend language even matter?

A few reasons:

For LLM apps specifically: yes, 95% of your wall-clock time is waiting for the model to respond. But the other 5% — routing, state management, provider selection, connection handling — is all yours to control. Rust makes that part bulletproof.

http://localhost:8080

by default, or runs in

User (Browser or CLI)
        │
        ▼
  Axum HTTP Server (web.rs)
        │
        ├──▶ Conversation State (Arc<Mutex<Vec<Message>>>)
        │
        └──▶ Runtime Config (config.rs)
                    │
                    ▼
            ChatClient (client.rs)
                    │
           ┌────────┼────────┐
           ▼        ▼        ▼
        Claude   OpenAI   Ollama
         API      API      API

The project has a clean five-module layout in src/

:

src/
├── main.rs          # Startup routing + CLI loop
├── config.rs        # Provider enum + env/runtime config
├── client.rs        # Provider-specific HTTP clients
├── conversation.rs  # In-memory chat state model
└── web.rs           # Axum routes, connect flow, chat API

Each module has exactly one responsibility. No god objects, no tangled imports.

The heart of the project is client.rs

. Instead of sprinkling provider-specific logic everywhere, all outbound AI calls go through a single ChatClient

that dispatches based on the active provider.

The key insight: Claude uses the Anthropic native API format, while OpenAI and Ollama both speak the OpenAI-compatible schema. Separating these two code paths keeps the provider logic honest — you're not faking compatibility where there isn't any.

// Simplified concept from client.rs
pub enum Provider {
    Claude,
    OpenAI,
    Ollama,
}

pub struct ChatClient {
    pub provider: Provider,
    pub model: String,
    pub base_url: String,
    pub api_key: Option<String>,
    pub max_tokens: u32,
    pub system_prompt: String,
    pub http: reqwest::Client,
}

When you send a message, the client picks the right HTTP contract:

pub async fn send(&self, messages: &[Message]) -> Result<String> {
    match self.provider {
        Provider::Claude => self.send_claude(messages).await,
        Provider::OpenAI | Provider::Ollama => self.send_openai_compat(messages).await,
    }
}

This means adding a new provider (Gemini, Cohere, etc.) in the future is a matter of adding one arm and one method — the rest of the application stays untouched.

Multi-turn chat requires persistent message history. In Rust's async model, sharing state across request handlers requires explicit synchronization. The project does this with Arc<Mutex<...>>

, Rust's standard pattern for shared mutable state:

// conversation.rs - shared across all handlers
pub type SharedConversation = Arc<Mutex<Vec<Message>>>;

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Message {
    pub role: String,    // "user" or "assistant"
    pub content: String,
}

The Arc

makes the conversation cloneable across Axum handlers (each handler runs in its own async task), and the Mutex

ensures only one handler touches the history at a time. No race conditions, guaranteed by the type system.

The app launches in Web UI mode by default, but also supports a terminal workflow:

cargo run

cargo run -- cli

PORT=3000 cargo run -- web

This is useful in different contexts — the Web UI for demos and sharing, the CLI for scripting and piping into other tools.

Download the prebuilt Windows executable (v1.0.1) and run it:

.\chatbot.exe
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

git clone https://github.com/MihirMohapatra/chatbot.git
cd chatbot
cp .env.example .env

cargo run

Set your provider in .env

:

PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
PROVIDER=openai
OPENAI_API_KEY=sk-...
ollama pull llama3.2
PROVIDER=ollama
MODEL=llama3.2
PROVIDER=openai
OPENAI_API_KEY=sk-or-...
BASE_URL=https://openrouter.ai/api
MODEL=anthropic/claude-sonnet-4

All environment variables:

Variable	Default	Description
`PROVIDER`
`claude`
`claude` , `openai` , `ollama`

`ANTHROPIC_API_KEY`
—	Required for Claude
`OPENAI_API_KEY`
—	Required for OpenAI
`MODEL`
provider default	Override the model name
`BASE_URL`
provider default	Override the API endpoint
`MAX_TOKENS`
`1024`
Response token cap
`SYSTEM_PROMPT`
built-in	Custom assistant behavior

docker build -t chatbot .
docker run -it --rm -v .env:/data/.env chatbot

For Ollama with host networking:

docker run -it --rm --network host -v .env:/data/.env chatbot

For backend workloads like this (concurrent HTTP + JSON + state management), Rust consistently outperforms Python across the metrics that matter in production:

Metric	Rust	Python	Why It Matters
Throughput (req/sec)	Higher	Lower	More concurrent users per instance
P95/P99 latency	Lower under load	Higher under load	More stable response times
Memory per worker	Lower	Higher	Better infra cost and density
CPU efficiency	Higher	Lower	More headroom before scaling out

Note: For LLM apps, model/API network time dominates total latency. But Rust still wins on concurrency behavior, memory footprint, and server efficiency — which directly impacts cost and reliability at scale.

tracing

crate + OpenTelemetry integrationThe project is open source and MIT licensed:

👉 github.com/MihirMohapatra/chatbot

If you're exploring Rust for backend systems, or building something that needs to talk to multiple AI providers without writing boilerplate for each one, this is a good starting point. Issues and PRs welcome.

Built with Rust 1.80+, Tokio, Axum, reqwest, serde, and anyhow.

source & further reading

dev.to — original article Merge Concurrent Agent Patches by Base Commit and Hunk Ownership Show What an AI Agent Did Not Inspect Before Asking for Review Build a Bounded JSON Repair Loop for LLM Output in Python

I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How

Run your AI side-project on zahid.host