04:00
2026-06-17
arxiv.org
large-language-models
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents
Researchers introduced SEAGym, an evaluation environment for self-evolving LLM agents that measures agent harness updates across training, validation, test, replay, and cost records. Instantiating SEAโฆ