23:03
2026-05-30
verkyyi.github.io
ai-agents
Show HN: HermesBench β workflow reliability evals for personal AI agents
HermesBench, a new benchmark for evaluating complete personal AI agent configurations rather than just models, launched with a public baseline score of 78.2 across 27 personal-agent recipes. The benchβ¦