FinanceBench

mentions 2 type Organization feed RSS

// recent coverage 2 mentions

22:10

2026-06-30

kinesthetic.dev

ai-agents

Improving agents from trajectories, in token space, with no weight updates

Sierra Research introduces the Context Engine and meta-distillation to address specification failure in enterprise agents, improving retrieval and action-check pass rates on benchmarks like τ³-bench, …

19:00

2026-06-04

dev.to

large-language-models

From 10% to 57% Accuracy on FinanceBench: What Actually Moved the Needle

A developer built a RAG system for financial document Q&A that improved accuracy from 10% to 57% on the FinanceBench benchmark, validated against 150 expert-annotated question-answer pairs from SEC fi…

// co-occurs with top 8 entities

GPT-4o 1 Patronus AI 1 Qdrant 1 LangGraph 1 BAAI/bge-reranker-base 1 SEC 1 João Paulo Trindade 1 Sierra Research 1

// topics top 6 topics

large language models 2 natural language processing 2 ai research 2 ai tools 1 ai infrastructure 1 ai agents 1