A benchmark for tool-using agents talking to a simulated user — and the reliability cliff at pass^k.
Yao et al. · arXiv 2024 · Reasoning & RL. Read the paper ↗ A free, interactive, animated visual explainer of τ-bench: Tool-Agent-User Interaction in Real-World Domains — every exhibit computed from the real formulas, with verbatim quotes from the source.
Questions #
- What is τ-bench: Tool-Agent-User Interaction in Real-World Domains?
- A benchmark for tool-using agents talking to a simulated user — and the reliability cliff at pass^k.
- Who published τ-bench: Tool-Agent-User Interaction in Real-World Domains, and where?
- Yao et al. — arXiv 2024 (arXiv:2406.12045).
- Where can I find a visual explainer of τ-bench: Tool-Agent-User Interaction in Real-World Domains?
- Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.
Related explainers #
DeepSeek-R1Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsTraining language models to follow instructions with human feedbackDirect Preference Optimization: Your Language Model is Secretly a Reward ModelDeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsScaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model ParametersConstitutional AI: Harmlessness from AI FeedbackDAPO: An Open-Source LLM Reinforcement Learning System at Scale