# I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How

> Source: <https://dev.to/mihir_mohapatra/i-built-a-production-oriented-multi-provider-ai-chatbot-in-rust-heres-how-1i44>
> Published: 2026-05-31 05:49:53+00:00

Most AI chatbot tutorials reach for Python. FastAPI, LangChain, a quick `requests.post`

— done in 20 minutes. And that's fine for prototyping. But when I wanted to build something I'd actually put behind a real API — something with proper async concurrency, typed errors, and zero GC pauses — I reached for Rust instead.

This is a writeup of ** chatbot**, a production-oriented Rust backend that unifies Claude, OpenAI, and Ollama behind a single interface — with a Web UI, CLI mode, and Docker support baked in.

It's a fair question. LLM API calls are network-bound, so why does the backend language even matter?

A few reasons:

For LLM apps specifically: yes, 95% of your wall-clock time is waiting for the model to respond. But the other 5% — routing, state management, provider selection, connection handling — is all yours to control. Rust makes that part bulletproof.

`http://localhost:8080`

by default, or runs in 

```
User (Browser or CLI)
        │
        ▼
  Axum HTTP Server (web.rs)
        │
        ├──▶ Conversation State (Arc<Mutex<Vec<Message>>>)
        │
        └──▶ Runtime Config (config.rs)
                    │
                    ▼
            ChatClient (client.rs)
                    │
           ┌────────┼────────┐
           ▼        ▼        ▼
        Claude   OpenAI   Ollama
         API      API      API
```

The project has a clean five-module layout in `src/`

:

```
src/
├── main.rs          # Startup routing + CLI loop
├── config.rs        # Provider enum + env/runtime config
├── client.rs        # Provider-specific HTTP clients
├── conversation.rs  # In-memory chat state model
└── web.rs           # Axum routes, connect flow, chat API
```

Each module has exactly one responsibility. No god objects, no tangled imports.

The heart of the project is `client.rs`

. Instead of sprinkling provider-specific logic everywhere, all outbound AI calls go through a single `ChatClient`

that dispatches based on the active provider.

The key insight: **Claude uses the Anthropic native API format, while OpenAI and Ollama both speak the OpenAI-compatible schema.** Separating these two code paths keeps the provider logic honest — you're not faking compatibility where there isn't any.

```
// Simplified concept from client.rs
pub enum Provider {
    Claude,
    OpenAI,
    Ollama,
}

pub struct ChatClient {
    pub provider: Provider,
    pub model: String,
    pub base_url: String,
    pub api_key: Option<String>,
    pub max_tokens: u32,
    pub system_prompt: String,
    pub http: reqwest::Client,
}
```

When you send a message, the client picks the right HTTP contract:

``` php
pub async fn send(&self, messages: &[Message]) -> Result<String> {
    match self.provider {
        Provider::Claude => self.send_claude(messages).await,
        Provider::OpenAI | Provider::Ollama => self.send_openai_compat(messages).await,
    }
}
```

This means adding a new provider (Gemini, Cohere, etc.) in the future is a matter of adding one arm and one method — the rest of the application stays untouched.

Multi-turn chat requires persistent message history. In Rust's async model, sharing state across request handlers requires explicit synchronization. The project does this with `Arc<Mutex<...>>`

, Rust's standard pattern for shared mutable state:

```
// conversation.rs - shared across all handlers
pub type SharedConversation = Arc<Mutex<Vec<Message>>>;

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Message {
    pub role: String,    // "user" or "assistant"
    pub content: String,
}
```

The `Arc`

makes the conversation cloneable across Axum handlers (each handler runs in its own async task), and the `Mutex`

ensures only one handler touches the history at a time. No race conditions, guaranteed by the type system.

The app launches in Web UI mode by default, but also supports a terminal workflow:

```
# Default: serves Web UI at http://localhost:8080
cargo run

# CLI mode: interactive terminal chat
cargo run -- cli

# Explicit web mode on a custom port
PORT=3000 cargo run -- web
```

This is useful in different contexts — the Web UI for demos and sharing, the CLI for scripting and piping into other tools.

Download the [prebuilt Windows executable (v1.0.1)](https://github.com/MihirMohapatra/chatbot/releases/download/v1.0.1/chatbot.exe) and run it:

```
.\chatbot.exe
# Opens http://localhost:8080
# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# 2. Clone and configure
git clone https://github.com/MihirMohapatra/chatbot.git
cd chatbot
cp .env.example .env

# 3. Run
cargo run
```

Set your provider in `.env`

:

```
PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
PROVIDER=openai
OPENAI_API_KEY=sk-...
# Pull a model first
ollama pull llama3.2
PROVIDER=ollama
MODEL=llama3.2
PROVIDER=openai
OPENAI_API_KEY=sk-or-...
BASE_URL=https://openrouter.ai/api
MODEL=anthropic/claude-sonnet-4
```

All environment variables:

| Variable | Default | Description |
|---|---|---|
`PROVIDER` |
`claude` |
`claude` , `openai` , `ollama`
|
`ANTHROPIC_API_KEY` |
— | Required for Claude |
`OPENAI_API_KEY` |
— | Required for OpenAI |
`MODEL` |
provider default | Override the model name |
`BASE_URL` |
provider default | Override the API endpoint |
`MAX_TOKENS` |
`1024` |
Response token cap |
`SYSTEM_PROMPT` |
built-in | Custom assistant behavior |

```
docker build -t chatbot .
docker run -it --rm -v .env:/data/.env chatbot
```

For Ollama with host networking:

```
docker run -it --rm --network host -v .env:/data/.env chatbot
```

For backend workloads like this (concurrent HTTP + JSON + state management), Rust consistently outperforms Python across the metrics that matter in production:

| Metric | Rust | Python | Why It Matters |
|---|---|---|---|
| Throughput (req/sec) | Higher | Lower | More concurrent users per instance |
| P95/P99 latency | Lower under load | Higher under load | More stable response times |
| Memory per worker | Lower | Higher | Better infra cost and density |
| CPU efficiency | Higher | Lower | More headroom before scaling out |

Note: For LLM apps, model/API network time dominates total latency. But Rust still wins on concurrency behavior, memory footprint, and server efficiency — which directly impacts cost and reliability at scale.

`tracing`

crate + OpenTelemetry integrationThe project is open source and MIT licensed:

👉 [github.com/MihirMohapatra/chatbot](https://github.com/MihirMohapatra/chatbot)

If you're exploring Rust for backend systems, or building something that needs to talk to multiple AI providers without writing boilerplate for each one, this is a good starting point. Issues and PRs welcome.

*Built with Rust 1.80+, Tokio, Axum, reqwest, serde, and anyhow.*