Run Hermes Agent on Any Model — Free, Local, and Cost-Routed To integrate Hermes, an open-source AI agent by Nous Research, with Lynkr, a self-hosted Node.js proxy that routes requests to various LLM providers. Lynkr enables automatic cost-tier routing based on prompt complexity, allowing simple tasks to use free local models like Ollama while complex reasoning goes to premium cloud models, potentially saving 60-80% on AI costs. The integration requires no code changes and works by having Hermes point at Lynkr's single OpenAI-compatible URL, which then handles provider selection, spend tracking, and telemetry across multiple AI tools. If you've spent any time wrestling with AI coding tools and agents in 2026, you've hit two walls: - Provider lock-in. Claude Code expects Anthropic. Codex expects OpenAI. Your shiny new agent framework wants whatever its README assumes. - Agent amnesia. Every session starts from zero. Your "AI assistant" doesn't actually learn anything about you, your codebase, or the work you did yesterday. Two open-source projects address those problems head-on — and they pair beautifully together. - by Nous Research — a self-improving AI agent with a built-in learning loop, multi-platform presence, and a serious tool ecosystem. Hermes Agent https://github.com/NousResearch/hermes-agent - — a self-hosted universal LLM proxy that lets any AI tool talk to any model provider. Lynkr https://github.com/Fast-Editor/Lynkr This post explains what each one is, why they exist, and shows you the exact steps to run Hermes through Lynkr so you can route Hermes to Databricks, Bedrock, Ollama, llama.cpp, Azure, OpenRouter — or all of them with automatic cost-tier routing. What Is Hermes Agent? Hermes is an open-source AI agent MIT-licensed, built by Nous Research https://nousresearch.com that you actually live inside, not just call. What makes it different from "yet another agent": - A closed learning loop. Hermes curates its own memory, autonomously creates skills procedural memory after complex tasks succeed, improves them during use, and searches its own past conversations via SQLite FTS5. It's the only agent I've seen that gets meaningfully better the longer you use it. - Lives where you do. A single gateway process plugs into Telegram, Discord, Slack, WhatsApp, Signal, Email, and a real terminal TUI. Send a voice memo from your phone, get a transcribed answer back, continue the same thread from your laptop later. - Runs anywhere. Seven terminal backends — local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox. Run it on a $5 VPS or a GPU cluster. Modal/Daytona give you serverless persistence — hibernates when idle, wakes on demand. - Built-in cron. "Every weekday at 8am, summarize my GitHub notifications and send to Telegram." That's a one-line cron job in natural language. - Delegates and parallelizes. Spawns isolated subagents for parallel workstreams; results come back without flooding your context. - Provider-agnostic by design. OpenRouter, Nous Portal, NovitaAI, NVIDIA NIM, Xiaomi MiMo, z.ai/GLM, Kimi/Moonshot, MiniMax, Hugging Face, OpenAI, or your own endpoint. Switch with hermes model — no code changes. Architecture in one paragraph The core is AIAgent in run agent.py — a synchronous tool-calling loop over OpenAI-format messages. model tools.py orchestrates ~40 built-in tools auto-discovered from tools/ . The CLI cli.py , ~11k LOC handles slash commands, prompt toolkit input, Rich rendering, and a data-driven skin engine. Provider profiles live under plugins/model-providers/