Mixture of Complementary Agents for Robust LLM Ensemble

Researchers have developed a new method for selecting which large language models (LLMs) to include in multi-AI ensembles, treating the process as a combinatorial selection problem focused on model complementarity rather than just accuracy or diversity. The approach uses computationally efficient greedy-style algorithms that assess how well different LLMs complement each other and the summarizer model, achieving better performance-cost trade-offs than existing methods. This work addresses a critical bottleneck in multi-AI collaboration pipelines, where choosing the right combination of proposer models can significantly improve the quality of synthesized answers.

arXiv:2605.24048v1 Announce Type: new Abstract: Multi-AI collaboration, such as ensembling or debating large language models LLMs , is a promising paradigm for aggregating information and boosting performance. A foundational step in these pipelines is to feed the responses of several proposer LLMs into a summarizer LLM, which synthesizes a better answer. However, choosing which proposers to include is non-trivial. Existing approaches primarily focus either on accuracy picking the strongest models or diversity ensuring variety , and often overlook the interactions among proposers and with the summarizer. We reframe proposer selection as a combinatorial selection problem akin to feature selection, where the value of an LLM lies in its complementarity with others. However, directly applying standard feature-selection algorithms is impractical in the LLM setting due to prohibitive time complexity. Motivated by this limitation, we explore an extensive range of computationally feasible, greedy-style selection algorithms that assess complementarity using a small labeled set. Our experiments validate complementarity as a guiding principle for proposer selection and identify methods that achieve the best performance-cost trade-offs in practice.