07:10
2026-06-16
github.com
large-language-models
Show HN: Kitchen Rush, Overcooked inspired LLM tool calling benchmark
Kitchen Rush, a new benchmark for evaluating large language model tool-calling, measures both accuracy and latency by simulating an Overcooked-style kitchen where thinking time directly impacts game pโฆ