Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

Researchers have developed Traj-Evolve, a self-evolving multi-agent system that models patient trajectories from electronic health records by using an experience pool and multi-agent reinforcement learning to mimic how clinicians learn from prior cases. In a lung cancer early detection task using up to five years of multimodal data, the system outperformed nine strong baselines across the overall population and a never-smoker subgroup. The system's dual mechanisms proved complementary, with the experience pool improving specificity and reinforcement learning improving sensitivity.

arXiv:2606.02812v1 Announce Type: new Abstract: Modeling patient trajectories from longitudinal electronic health records EHRs requires reasoning over sparse, noisy, and long-context multimodal sequences. Existing LLM-based multi-agent systems address context length but process patients in isolation, failing to mirror how clinicians leverage accumulated experience from similar prior cases. We present Traj-Evolve, a self-evolving multi-agent system with two complementary evolving mechanisms. First, an Experience Pool ExPool acts as a non-parametric memory, indexing rejection-sampled reasoning traces to retrieve similar patients as few-shot contexts. Second, multi-agent reinforcement learning MARL via reward-ranked fine-tuning parametrically optimizes inter-agent and agent-memory collaboration. A leave-one-out cross-retrieval strategy unifies the two, aligning training- and inference-time behavior under retrieval augmentation. On a lung cancer prediction task utilizing up to five years of multimodal EHRs, Traj-Evolve outperforms 9 strong baselines on the overall population and a challenging never-smoker population. Analysis of the evolving dynamics highlights three key findings: 1 expanding the ExPool shifts optimal retrieval from diverse to specific samples; 2 under MARL, the manager agent's prediction loss converges quickly while the worker agents' temporal reasoning continues to benefit from more verified patients; and 3 the two mechanisms are complementary on the predicted risk, where ExPool improves specificity while MARL improves sensitivity.