{"slug": "run-hermes-agent-on-any-model-free-local-and-cost-routed", "title": "Run Hermes Agent on Any Model — Free, Local, and Cost-Routed", "summary": "To integrate Hermes, an open-source AI agent by Nous Research, with Lynkr, a self-hosted Node.js proxy that routes requests to various LLM providers. Lynkr enables automatic cost-tier routing based on prompt complexity, allowing simple tasks to use free local models like Ollama while complex reasoning goes to premium cloud models, potentially saving 60-80% on AI costs. The integration requires no code changes and works by having Hermes point at Lynkr's single OpenAI-compatible URL, which then handles provider selection, spend tracking, and telemetry across multiple AI tools.", "body_md": "If you've spent any time wrestling with AI coding tools and agents in 2026, you've hit two walls:\n\n-\n**Provider lock-in.** Claude Code expects Anthropic. Codex expects OpenAI. Your shiny new agent framework wants whatever its README assumes. -\n**Agent amnesia.** Every session starts from zero. Your \"AI assistant\" doesn't actually learn anything about you, your codebase, or the work you did yesterday.\n\nTwo open-source projects address those problems head-on — and they pair beautifully together.\n\n-\n(by Nous Research) — a self-improving AI agent with a built-in learning loop, multi-platform presence, and a serious tool ecosystem.[Hermes Agent](https://github.com/NousResearch/hermes-agent) -\n— a self-hosted universal LLM proxy that lets any AI tool talk to any model provider.[Lynkr](https://github.com/Fast-Editor/Lynkr)\n\nThis post explains what each one is, why they exist, and shows you the exact steps to run **Hermes through Lynkr** so you can route Hermes to Databricks, Bedrock, Ollama, llama.cpp, Azure, OpenRouter — or all of them with automatic cost-tier routing.\n\n## What Is Hermes Agent?\n\nHermes is an open-source AI agent (MIT-licensed, built by [Nous Research](https://nousresearch.com)) that you actually live inside, not just call.\n\nWhat makes it different from \"yet another agent\":\n\n-\n**A closed learning loop.** Hermes curates its own memory, autonomously creates*skills*(procedural memory) after complex tasks succeed, improves them during use, and searches its own past conversations via SQLite FTS5. It's the only agent I've seen that gets meaningfully better the longer you use it. -\n**Lives where you do.** A single gateway process plugs into Telegram, Discord, Slack, WhatsApp, Signal, Email, and a real terminal TUI. Send a voice memo from your phone, get a transcribed answer back, continue the same thread from your laptop later. -\n**Runs anywhere.** Seven terminal backends — local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox. Run it on a $5 VPS or a GPU cluster. Modal/Daytona give you serverless persistence — hibernates when idle, wakes on demand. -\n**Built-in cron.**\"Every weekday at 8am, summarize my GitHub notifications and send to Telegram.\" That's a one-line cron job in natural language. -\n**Delegates and parallelizes.** Spawns isolated subagents for parallel workstreams; results come back without flooding your context. -\n**Provider-agnostic by design.** OpenRouter, Nous Portal, NovitaAI, NVIDIA NIM, Xiaomi MiMo, z.ai/GLM, Kimi/Moonshot, MiniMax, Hugging Face, OpenAI, or your own endpoint. Switch with`hermes model`\n\n— no code changes.\n\n### Architecture in one paragraph\n\nThe core is `AIAgent`\n\nin `run_agent.py`\n\n— a synchronous tool-calling loop over OpenAI-format messages. `model_tools.py`\n\norchestrates ~40 built-in tools auto-discovered from `tools/`\n\n. The CLI (`cli.py`\n\n, ~11k LOC) handles slash commands, prompt_toolkit input, Rich rendering, and a data-driven skin engine. Provider profiles live under `plugins/model-providers/<name>/`\n\nand contribute `base_url`\n\n, `env_vars`\n\n, `api_mode`\n\n, and `fallback_models`\n\n— the runtime resolver merges those with `custom_providers`\n\nfrom `config.yaml`\n\nto figure out where to send each request. That last detail is what makes Lynkr integration trivial.\n\n### Install Hermes in one line\n\n```\ncurl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash\n```\n\nThen `hermes`\n\nto start chatting.\n\n## What Is Lynkr?\n\nLynkr is a self-hosted Node.js proxy that sits **between any AI coding tool and any LLM provider**. One environment variable change, and your tool works with whatever backend you want.\n\n```\nClaude Code / Cursor / Codex / Cline / Continue / Hermes / Vercel AI SDK\n                                |\n                              Lynkr  (http://localhost:8081)\n                                |\n   Ollama | Bedrock | Databricks | OpenRouter | Azure | OpenAI | llama.cpp | LM Studio | z.ai | Vertex | Moonshot\n```\n\n### What's actually inside\n\nI went through the source. Lynkr is more than a \"translate request, forward, translate response\" proxy:\n\n-\n**Format conversion.** Anthropic ↔ OpenAI ↔ Codex Responses API ↔ Databricks ↔ Bedrock — handled in`src/clients/`\n\n(`openai-format.js`\n\n,`responses-format.js`\n\n,`databricks.js`\n\n,`bedrock-utils.js`\n\n, etc.). -\n**Tier-based routing.**`src/routing/`\n\nanalyzes prompt complexity, agentic intent, risk, and latency, then routes to a`TIER_SIMPLE`\n\n/`TIER_STANDARD`\n\n/`TIER_COMPLEX`\n\nmodel. Cheap stuff goes to Ollama; gnarly stuff goes to a frontier cloud model. This is where the headline \"60–80% cost savings\" comes from. -\n**Resilience.** Circuit breaker (cockatiel), retries, DNS logging, prompt cache injection. -\n**MCP integration + Code Mode.** Auto-discovers MCP servers and can collapse 100+ MCP tool definitions into 4 meta-tools (~96% token reduction). -\n**Observability built in.** Telemetry, latency tracking, usage reporting (`lynkr usage`\n\nshows AI spend and tier savings), trajectory export as JSONL for training (`lynkr trajectory`\n\n). -\n**699 passing tests.** Routing, format conversion, streaming, error resilience, memory store, prompt cache — it's seriously tested for a side-project proxy.\n\n### Install Lynkr in one line\n\n```\ncurl -fsSL https://raw.githubusercontent.com/Fast-Editor/Lynkr/main/install.sh | bash\n```\n\nOr via npm: `npm install -g pino-pretty && npm install -g lynkr`\n\n.\n\n## Why Use Them Together?\n\nHermes already supports a long list of providers natively. Why bolt Lynkr in front?\n\nThree concrete reasons:\n\n### 1. Unify your enterprise creds\n\nYour company has a Databricks endpoint serving Claude, an AWS Bedrock account with cross-region inference profiles, an Azure OpenAI deployment, *and* a private Ollama box. With Lynkr, all of those live behind **one** OpenAI-compatible URL. Hermes points at that URL and stops caring which backend is serving the request.\n\n### 2. Automatic cost-tier routing\n\nThis is the killer feature. Hermes can switch models with `/model`\n\n, but Lynkr will switch *per request* based on complexity. Simple tool calls and short prompts go to free local Ollama. Heavy reasoning goes to your premium cloud model. You don't think about it — Lynkr's `complexity-analyzer.js`\n\nand `risk-analyzer.js`\n\ndecide.\n\nRun `lynkr usage`\n\nafterward to see the actual savings.\n\n### 3. Centralized observability for every agent + tool\n\nIf you run Hermes + Claude Code + Cursor + Codex all on the same machine — and a lot of us do — Lynkr becomes a single chokepoint for spend, telemetry, prompt caching, and trajectory capture across all of them. You get one usage report instead of four dashboards.\n\n## How to Use Lynkr With Hermes\n\nThe integration is genuinely 3 minutes of work because both tools speak OpenAI-compatible HTTP.\n\n### Step 1: Start Lynkr with a backend\n\nPick whatever provider you want Lynkr to route to. For a local-first setup:\n\n```\n# .env in your Lynkr directory (or just exports)\nexport MODEL_PROVIDER=ollama\nexport OLLAMA_MODEL=qwen2.5-coder:latest\nexport OLLAMA_ENDPOINT=http://localhost:11434\n\nlynkr start\n```\n\nOr for tier routing across providers:\n\n```\nexport TIER_SIMPLE=ollama:qwen2.5-coder:latest\nexport TIER_STANDARD=openrouter:anthropic/claude-3.5-haiku\nexport TIER_COMPLEX=bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0\nexport OPENROUTER_API_KEY=sk-or-...\nexport AWS_BEDROCK_API_KEY=...\nlynkr start\n```\n\nLynkr now listens on `http://localhost:8081`\n\n(OpenAI-compatible) and `http://localhost:8081/v1/messages`\n\n(Anthropic-compatible).\n\n### Step 2: Register Lynkr as a custom provider in Hermes\n\nHermes resolves providers through `plugins/model-providers/<name>/`\n\nprofiles **plus** a `custom_providers`\n\nlist in your `~/.hermes/config.yaml`\n\n. Add an entry:\n\n```\ncustom_providers:\n  - name: lynkr\n    base_url: http://localhost:8081/v1\n    api_mode: chat_completions\n    env_var: LYNKR_API_KEY      # any string works — Lynkr doesn't validate\n    models:\n      - auto                    # Lynkr's tier router picks the actual model\n      - qwen2.5-coder:latest\n      - anthropic/claude-3.5-sonnet\n```\n\nThen set the key (any value):\n\n```\nhermes config set env.LYNKR_API_KEY sk-lynkr\n```\n\n### Step 3: Point Hermes at Lynkr\n\n```\nhermes model custom:lynkr/auto\n```\n\nOr interactively: run `hermes model`\n\n, pick `custom:lynkr`\n\n, choose `auto`\n\n.\n\nThat's it. Every Hermes turn now flows through Lynkr, which routes to the right backend based on tier and complexity. Run a few turns, then:\n\n```\nlynkr usage\n```\n\n…and you'll see the per-tier spend breakdown and dollars saved versus a single-frontier-model baseline.\n\n### Bonus: voice memo → Hermes → Lynkr → cheapest model\n\nBecause Hermes already has Telegram and voice memo transcription wired in, this whole stack means:\n\nRecord a voice memo on your phone → Hermes transcribes it → routes the request through Lynkr → Lynkr picks Ollama for the \"what time is it in Tokyo\" parts and Sonnet for the \"refactor this function\" parts → reply comes back to your phone.\n\nYou built that in 5 minutes with two `npm`\n\n/`bash`\n\ninstallers and a YAML edit.\n\n## When NOT to Use Lynkr With Hermes\n\nBeing honest:\n\n-\n**You only use one provider.** Hermes already supports it natively. Adding Lynkr is extra latency and another process to babysit. -\n**You need streaming reasoning tokens from a specific model.** Make sure Lynkr's format converter for that provider preserves what you need — it does for most cases, but verify before betting on it. -\n**You're on a constrained environment.** Lynkr is Node 20+. Hermes is Python 3.11. That's two runtimes on a Raspberry Pi.\n\nFor everything else — multi-provider workflows, enterprise creds, cost optimization, observability — the combination is hard to beat.\n\n## TL;DR\n\n| Need | Tool |\n|---|---|\n| A real AI agent that learns, remembers, and lives across Telegram/Discord/CLI | Hermes |\n| Route any AI tool to any LLM provider with automatic cost tiers | Lynkr |\n| Both | Point Hermes at Lynkr via `custom_providers` in `config.yaml`\n|\n\n### Links\n\n- Hermes Agent:\n[https://github.com/NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent) - Hermes docs:\n[https://hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs) - Lynkr:\n[https://github.com/Fast-Editor/Lynkr](https://github.com/Fast-Editor/Lynkr) - Lynkr docs:\n[https://fast-editor.github.io/Lynkr/](https://fast-editor.github.io/Lynkr/)\n\nIf you build something with this combo, drop a comment — I'd love to see what stacks people are putting together.", "url": "https://wpnews.pro/news/run-hermes-agent-on-any-model-free-local-and-cost-routed", "canonical_source": "https://dev.to/vishal_veerareddy_9cdd17d/hermes-lynkr-the-self-improving-agent-meets-the-universal-llm-proxy-3n11", "published_at": "2026-05-22 05:22:50+00:00", "updated_at": "2026-05-22 05:34:08.672115+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "open-source", "developer-tools"], "entities": ["Hermes Agent", "Nous Research", "Lynkr", "Databricks", "Bedrock", "Ollama", "llama.cpp", "Azure"], "alternates": {"html": "https://wpnews.pro/news/run-hermes-agent-on-any-model-free-local-and-cost-routed", "markdown": "https://wpnews.pro/news/run-hermes-agent-on-any-model-free-local-and-cost-routed.md", "text": "https://wpnews.pro/news/run-hermes-agent-on-any-model-free-local-and-cost-routed.txt", "jsonld": "https://wpnews.pro/news/run-hermes-agent-on-any-model-free-local-and-cost-routed.jsonld"}}