{"slug": "how-i-built-aegisdesk-a-zero-token-semantic-it-agent-with-5ms-latency", "title": "How I Built AegisDesk: A Zero-Token Semantic IT Agent with <5ms Latency", "summary": "The article describes the development of AegisDesk, an open-source, multi-agent IT helpdesk platform that replaces traditional LLM-based intent routing with a deterministic, zero-token semantic routing system. By using local sentence embeddings and cosine similarity comparisons against an offline vocabulary, the system routes queries to specialized sub-agents in approximately 4.5 milliseconds with no API costs, while incorporating dynamic few-shot learning via SQLite and robust security measures like RCE and SSRF defenses.", "body_md": "If you’ve built AI agents recently, you know the standard playbook: you take a user's prompt, feed it into GPT-4 or Claude alongside a massive JSON schema of available tools, and ask the LLM to figure out which tool to use.\nThis works for prototypes. But in an Enterprise IT environment, it’s a disaster.\nUsing an LLM for Intent Routing takes anywhere from 800ms to 2,000ms. It burns API tokens on every single \"hello\" or \"my laptop is broken\" message. Worse, LLMs hallucinate—if a user asks to \"Provision an Azure SQL database,\" an overly helpful LLM might hallucinate a non-existent tool call and crash your pipeline.\nI wanted to build an autonomous IT Helpdesk agent that was deterministic, instant, and practically free to run. That led me to build AegisDesk, an open-source, multi-agent IT platform powered by LangGraph, SQLite, and Zero-Token Semantic Routing.\nThe Architecture: Zero-Token Routing\nInstead of relying on a monolithic prompt, AegisDesk abandons LLM-based routing entirely.\nWhen a query enters AegisDesk, it never hits the cloud. Instead, the local pipeline intercepts the query and embeds it using the BAAI/bge-small-en-v1.5 sentence-transformer model via ONNX (fastembed).\nThis local vector is then mathematically compared (via Cosine Similarity) against an offline vocabulary of IT intents:\nnetwork_diagnostics: (ping, traceroute, nmap, tcp, udp)\ncloud_integrations: (okta, jira, aws, azure, cyberark)\nweb_scraping: (wiki, internal docs, cve lookup)\nThe result? The query is mathematically routed to the correct highly-specialized LangGraph sub-agent in ~4.5 milliseconds for $0.00.\nTIP\nEnterprise Safety Net: If the semantic match confidence falls below 0.55, AegisDesk refuses to guess. It safely falls back to a generalized, read-only RAG (Retrieval-Augmented Generation) agent, guaranteeing no destructive commands are executed by mistake.\nDynamic Few-Shot Learning via SQLite\nStatic keywords are great, but IT environments evolve. What happens when a user types an obscure proprietary software name that isn't in our offline vocabulary?\nTo solve this, I integrated Dynamic Few-Shot Learning directly into the routing layer using SQLite Graph Memory.\nWhen AegisDesk initializes, it queries a routing_examples table inside an ACID-compliant SQLite database. It extracts historical, successfully resolved IT tickets and embeds them dynamically into the routing corpus.\nIf an Administrator notices the agent struggling with a query like \"Run a traceroute to internal-git.corp\", they can manually inject the learning directly via the CLI:\nbash\naegisdesk teach-router \"Run a traceroute to internal-git.corp\" it_support network_diagnostics\nThe next time the router boots, it embeds that exact phrase. The system effectively \"fine-tunes\" its routing logic in real-time, achieving >90% strict-match routing accuracy without a single line of Python code being altered.\nZero-Trust Security Boundaries\nBuilding an autonomous agent that can execute ipconfig, ping, or scrape internal HR wikis is inherently dangerous. AegisDesk implements two critical security mitigations at the tool execution layer:\nRCE Defense (Remote Code Execution): Subprocess execution explicitly enforces shell=False. Before any command touches the OS, inputs are scrubbed using strict Regex [^a-zA-Z0-9.-_] to eliminate bash metacharacters (&, |, ;, $).\nSSRF Defense (Server-Side Request Forgery): The Web Scraping agent is hardened against TOCTOU (Time-Of-Check to Time-Of-Use) attacks. Outbound HTTP requests undergo pre-flight DNS checks. Any resolution attempting to hit loopback (127.0.0.1) or private cloud metadata subnets (169.254.169.254) is aborted at the socket level.\nEven with these defenses, AegisDesk utilizes LangGraph's interrupt_before functionality to trigger Human-in-the-Loop (HITL) confirmations before executing any terminal command.\nTry It Out\nAegisDesk proves that you don't need massive, bloated monolithic LLMs to build intelligent enterprise agents. By pairing lightning-fast deterministic routing with specialized LangGraph swarms, you can build systems that are safer, cheaper, and exponentially faster.\nYou can install the CLI directly from PyPI today:\nbash\npip install aegisdesk\nCheck out the full source code and documentation on GitHub: github.com/sitanshukr08/Aegisdesk\nIf you’re building multi-agent swarms or semantic routers, I’d love to hear your thoughts in the comments!", "url": "https://wpnews.pro/news/how-i-built-aegisdesk-a-zero-token-semantic-it-agent-with-5ms-latency", "canonical_source": "https://dev.to/sitanshukr08/how-i-built-aegisdesk-a-zero-token-semantic-it-agent-with-5ms-latency-3p6p", "published_at": "2026-05-22 16:34:00+00:00", "updated_at": "2026-05-22 17:07:28.750451+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "open-source", "developer-tools"], "entities": ["AegisDesk", "LangGraph", "SQLite", "BAAI", "bge-small-en-v1.5", "ONNX", "fastembed", "Azure"], "alternates": {"html": "https://wpnews.pro/news/how-i-built-aegisdesk-a-zero-token-semantic-it-agent-with-5ms-latency", "markdown": "https://wpnews.pro/news/how-i-built-aegisdesk-a-zero-token-semantic-it-agent-with-5ms-latency.md", "text": "https://wpnews.pro/news/how-i-built-aegisdesk-a-zero-token-semantic-it-agent-with-5ms-latency.txt", "jsonld": "https://wpnews.pro/news/how-i-built-aegisdesk-a-zero-token-semantic-it-agent-with-5ms-latency.jsonld"}}