Show HN: Morph Reflexes – Multi-head classifiers for agent traces

wpnews.pro

cd /news/ai-agents/show-hn-morph-reflexes-multi-head-cl… · home › topics › ai-agents › article

[ARTICLE · art-47807] src=news.ycombinator.com ↗ pub=2026-06-30T20:52Z topic=ai-agents verified=true sentiment=↑ positive

Show HN: Morph Reflexes – Multi-head classifiers for agent traces

Morph Reflexes launches a multi-head classifier API that analyzes agent traces for behavioral failures like looping and user frustration. The system uses a shared LLM backbone with reused KV cache to run dozens of semantic signals in under 30ms, solving the cost and latency issues of frontier model judges. Built on a custom inference engine forked from vLLM, it targets production AI agents at scale.

read2 min views2 publishedJun 30, 2026

| |||||||||||||||||||| 14 points by | The most common failures for production agents are behavioral: looping, reasoning leakage, user frustration, and more. Using a frontier model like GPT or Sonnet to judge every turn is too expensive and slow to run at scale. To solve this, we built Reflexes: semantic signals from agent traces, served fast and cheap over API. Built on custom kernels and a custom inference engine forked from vLLM. Under the hood, it is a small LLM architected around multi-head inference. Small models need to be trained for specific tasks, but running 50 separate small models on the same input for 50 tasks makes no sense. How it works: We use a modern LLM with hybrid attention and remove the decode step. We built an inference engine that lets prefill compute be 99% reused from reflex to reflex, similar in spirit to older 2019-era BERT/HYDRA and older multiple-head techniques. we built the inference engine to reuse the KV/cache across inputs and compute across all reflexes. One shared backbone reads the trace once, then many heads classify different signals. Our inference engine reuses the same KV/cache and compute across all reflexes, giving us sub-30ms inference with less than 0.1% overhead for each additional reflex. We took the same high-level idea and did the hard work to make it work with a modern architecture and attention. On it, we can run inference in under 30ms and serve the full request in under 90ms. If you run 4 reflexes or 100, the extra overhead is less than 2ms. Why does optimizing this matter? If you’re even a medium-sized startup, you’re dealing with tens of thousands of agent runs and millions of turns. If you want to track things like user frustration rates over time, frontier LLM-as-judge does not scale. I built a similar stack at Tesla. When ML engineers needed to sample data across petabytes for signals like is_camera_obfuscated=true, along with 200 other things, you need to 1) spin them up quickly 2) run at scale efficiently What it is not: A dashboard. 99% of dashboards go unused. 100% API first and made for devs who want to use this to trigger their own stuff. vibetrain a custom reflex in our dashboard, and/or then let it self improve in production: Docs: I’d love feedback from people running agents in prod: what sorts of things do you wish you could track over time across 100% of turns but cant right now? TLDR: semantic signals from agent traces, super fast, cheap via API | ||||||||||||||||||| |

source & further reading

news.ycombinator.com — original article Return of RSS Feeds Ask HN: Which AI model do you use for what? Fable 5. Safety Taken to an Extreme

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-morph-reflexes-m…

Read original on news.ycombinator.com → news.ycombinator.com/item?id=48739038

mentioned entities

Morph Reflexes

vLLM

GPT

Sonnet

Tesla

metadata

slugshow-hn-morph-reflexes-multi-head-classifiers-for-agent-traces

topic#ai-agents

secondary4 topics

sentimentpositive

canonicalnews.ycombinator.com

navigation

← prevMichael Burry initiates first-ev…

next →Chipmakers Cash In as AI Mania S…

── more in #ai-agents 4 stories · sorted by recency

electronictradinghub.com · 4 Jul · #ai-agents

How Tier-1 capital market is using AI Agent architecture

byteiota.com · 4 Jul · #ai-agents

Claude Sonnet 5: What Developers Need to Know Before Migrating

sourcefeed.dev · 4 Jul · #ai-agents

Stop the Credit Bleed: Mastering Copilot Token Efficiency

dev.to · 4 Jul · #ai-agents

No one reads privacy policies. So I built 6 AI Agents to do it for me.

── more on @morph reflexes 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-infrastructure

ML-KEM + X-Wing Patches Posted For Linux To Help With Post-Quantum Security

wpnews · 4 Jul · #artificial-intelligence

Istota, a personal AI operating system

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required