Run Hermes Agent on Any Model — Free, Local, and Cost-Routed

To integrate Hermes, an open-source AI agent by Nous Research, with Lynkr, a self-hosted Node.js proxy that routes requests to various LLM providers. Lynkr enables automatic cost-tier routing based on prompt complexity, allowing simple tasks to use free local models like Ollama while complex reasoning goes to premium cloud models, potentially saving 60-80% on AI costs. The integration requires no code changes and works by having Hermes point at Lynkr's single OpenAI-compatible URL, which then handles provider selection, spend tracking, and telemetry across multiple AI tools.

If you've spent any time wrestling with AI coding tools and agents in 2026, you've hit two walls: - Provider lock-in. Claude Code expects Anthropic. Codex expects OpenAI. Your shiny new agent framework wants whatever its README assumes. - Agent amnesia. Every session starts from zero. Your "AI assistant" doesn't actually learn anything about you, your codebase, or the work you did yesterday. Two open-source projects address those problems head-on — and they pair beautifully together. - by Nous Research — a self-improving AI agent with a built-in learning loop, multi-platform presence, and a serious tool ecosystem. Hermes Agent https://github.com/NousResearch/hermes-agent - — a self-hosted universal LLM proxy that lets any AI tool talk to any model provider. Lynkr https://github.com/Fast-Editor/Lynkr This post explains what each one is, why they exist, and shows you the exact steps to run Hermes through Lynkr so you can route Hermes to Databricks, Bedrock, Ollama, llama.cpp, Azure, OpenRouter — or all of them with automatic cost-tier routing. What Is Hermes Agent? Hermes is an open-source AI agent MIT-licensed, built by Nous Research https://nousresearch.com that you actually live inside, not just call. What makes it different from "yet another agent": - A closed learning loop. Hermes curates its own memory, autonomously creates skills procedural memory after complex tasks succeed, improves them during use, and searches its own past conversations via SQLite FTS5. It's the only agent I've seen that gets meaningfully better the longer you use it. - Lives where you do. A single gateway process plugs into Telegram, Discord, Slack, WhatsApp, Signal, Email, and a real terminal TUI. Send a voice memo from your phone, get a transcribed answer back, continue the same thread from your laptop later. - Runs anywhere. Seven terminal backends — local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox. Run it on a $5 VPS or a GPU cluster. Modal/Daytona give you serverless persistence — hibernates when idle, wakes on demand. - Built-in cron. "Every weekday at 8am, summarize my GitHub notifications and send to Telegram." That's a one-line cron job in natural language. - Delegates and parallelizes. Spawns isolated subagents for parallel workstreams; results come back without flooding your context. - Provider-agnostic by design. OpenRouter, Nous Portal, NovitaAI, NVIDIA NIM, Xiaomi MiMo, z.ai/GLM, Kimi/Moonshot, MiniMax, Hugging Face, OpenAI, or your own endpoint. Switch with hermes model — no code changes. Architecture in one paragraph The core is AIAgent in run agent.py — a synchronous tool-calling loop over OpenAI-format messages. model tools.py orchestrates ~40 built-in tools auto-discovered from tools/ . The CLI cli.py , ~11k LOC handles slash commands, prompt toolkit input, Rich rendering, and a data-driven skin engine. Provider profiles live under plugins/model-providers/<name / and contribute base url , env vars , api mode , and fallback models — the runtime resolver merges those with custom providers from config.yaml to figure out where to send each request. That last detail is what makes Lynkr integration trivial. Install Hermes in one line curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash Then hermes to start chatting. What Is Lynkr? Lynkr is a self-hosted Node.js proxy that sits between any AI coding tool and any LLM provider . One environment variable change, and your tool works with whatever backend you want. Claude Code / Cursor / Codex / Cline / Continue / Hermes / Vercel AI SDK | Lynkr http://localhost:8081 | Ollama | Bedrock | Databricks | OpenRouter | Azure | OpenAI | llama.cpp | LM Studio | z.ai | Vertex | Moonshot What's actually inside I went through the source. Lynkr is more than a "translate request, forward, translate response" proxy: - Format conversion. Anthropic ↔ OpenAI ↔ Codex Responses API ↔ Databricks ↔ Bedrock — handled in src/clients/ openai-format.js , responses-format.js , databricks.js , bedrock-utils.js , etc. . - Tier-based routing. src/routing/ analyzes prompt complexity, agentic intent, risk, and latency, then routes to a TIER SIMPLE / TIER STANDARD / TIER COMPLEX model. Cheap stuff goes to Ollama; gnarly stuff goes to a frontier cloud model. This is where the headline "60–80% cost savings" comes from. - Resilience. Circuit breaker cockatiel , retries, DNS logging, prompt cache injection. - MCP integration + Code Mode. Auto-discovers MCP servers and can collapse 100+ MCP tool definitions into 4 meta-tools ~96% token reduction . - Observability built in. Telemetry, latency tracking, usage reporting lynkr usage shows AI spend and tier savings , trajectory export as JSONL for training lynkr trajectory . - 699 passing tests. Routing, format conversion, streaming, error resilience, memory store, prompt cache — it's seriously tested for a side-project proxy. Install Lynkr in one line curl -fsSL https://raw.githubusercontent.com/Fast-Editor/Lynkr/main/install.sh | bash Or via npm: npm install -g pino-pretty && npm install -g lynkr . Why Use Them Together? Hermes already supports a long list of providers natively. Why bolt Lynkr in front? Three concrete reasons: 1. Unify your enterprise creds Your company has a Databricks endpoint serving Claude, an AWS Bedrock account with cross-region inference profiles, an Azure OpenAI deployment, and a private Ollama box. With Lynkr, all of those live behind one OpenAI-compatible URL. Hermes points at that URL and stops caring which backend is serving the request. 2. Automatic cost-tier routing This is the killer feature. Hermes can switch models with /model , but Lynkr will switch per request based on complexity. Simple tool calls and short prompts go to free local Ollama. Heavy reasoning goes to your premium cloud model. You don't think about it — Lynkr's complexity-analyzer.js and risk-analyzer.js decide. Run lynkr usage afterward to see the actual savings. 3. Centralized observability for every agent + tool If you run Hermes + Claude Code + Cursor + Codex all on the same machine — and a lot of us do — Lynkr becomes a single chokepoint for spend, telemetry, prompt caching, and trajectory capture across all of them. You get one usage report instead of four dashboards. How to Use Lynkr With Hermes The integration is genuinely 3 minutes of work because both tools speak OpenAI-compatible HTTP. Step 1: Start Lynkr with a backend Pick whatever provider you want Lynkr to route to. For a local-first setup: .env in your Lynkr directory or just exports export MODEL PROVIDER=ollama export OLLAMA MODEL=qwen2.5-coder:latest export OLLAMA ENDPOINT=http://localhost:11434 lynkr start Or for tier routing across providers: export TIER SIMPLE=ollama:qwen2.5-coder:latest export TIER STANDARD=openrouter:anthropic/claude-3.5-haiku export TIER COMPLEX=bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0 export OPENROUTER API KEY=sk-or-... export AWS BEDROCK API KEY=... lynkr start Lynkr now listens on http://localhost:8081 OpenAI-compatible and http://localhost:8081/v1/messages Anthropic-compatible . Step 2: Register Lynkr as a custom provider in Hermes Hermes resolves providers through plugins/model-providers/<name / profiles plus a custom providers list in your ~/.hermes/config.yaml . Add an entry: custom providers: - name: lynkr base url: http://localhost:8081/v1 api mode: chat completions env var: LYNKR API KEY any string works — Lynkr doesn't validate models: - auto Lynkr's tier router picks the actual model - qwen2.5-coder:latest - anthropic/claude-3.5-sonnet Then set the key any value : hermes config set env.LYNKR API KEY sk-lynkr Step 3: Point Hermes at Lynkr hermes model custom:lynkr/auto Or interactively: run hermes model , pick custom:lynkr , choose auto . That's it. Every Hermes turn now flows through Lynkr, which routes to the right backend based on tier and complexity. Run a few turns, then: lynkr usage …and you'll see the per-tier spend breakdown and dollars saved versus a single-frontier-model baseline. Bonus: voice memo → Hermes → Lynkr → cheapest model Because Hermes already has Telegram and voice memo transcription wired in, this whole stack means: Record a voice memo on your phone → Hermes transcribes it → routes the request through Lynkr → Lynkr picks Ollama for the "what time is it in Tokyo" parts and Sonnet for the "refactor this function" parts → reply comes back to your phone. You built that in 5 minutes with two npm / bash installers and a YAML edit. When NOT to Use Lynkr With Hermes Being honest: - You only use one provider. Hermes already supports it natively. Adding Lynkr is extra latency and another process to babysit. - You need streaming reasoning tokens from a specific model. Make sure Lynkr's format converter for that provider preserves what you need — it does for most cases, but verify before betting on it. - You're on a constrained environment. Lynkr is Node 20+. Hermes is Python 3.11. That's two runtimes on a Raspberry Pi. For everything else — multi-provider workflows, enterprise creds, cost optimization, observability — the combination is hard to beat. TL;DR | Need | Tool | |---|---| | A real AI agent that learns, remembers, and lives across Telegram/Discord/CLI | Hermes | | Route any AI tool to any LLM provider with automatic cost tiers | Lynkr | | Both | Point Hermes at Lynkr via custom providers in config.yaml | Links - Hermes Agent: https://github.com/NousResearch/hermes-agent https://github.com/NousResearch/hermes-agent - Hermes docs: https://hermes-agent.nousresearch.com/docs https://hermes-agent.nousresearch.com/docs - Lynkr: https://github.com/Fast-Editor/Lynkr https://github.com/Fast-Editor/Lynkr - Lynkr docs: https://fast-editor.github.io/Lynkr/ https://fast-editor.github.io/Lynkr/ If you build something with this combo, drop a comment — I'd love to see what stacks people are putting together.