{"slug": "a-high-level-model-of-ai-bargaining", "title": "A high-level model of AI bargaining", "summary": "Advanced AIs may use credible commitments unavailable to humans when bargaining over resources, according to a new model based on program equilibrium. The model outlines a two-phase process where agents commit to bargaining procedures before negotiating, enabling conditional commitments that could reshape conflict dynamics. This framework suggests that AI bargaining will not necessarily result in Nash equilibrium outcomes.", "body_md": "Advanced AIs might be capable of various [credible commitments](https://longtermrisk.org/commitment-ability-in-multipolar-ai-scenarios/#Potential_approaches_to_commitment_between_AI_systems) unavailable to humans, which they could use when bargaining with each other. “Bargaining” can sound like something pretty specific: haggling over (literal) prices. But, in the sense discussed in Schelling’s *The Strategy of Conflict* for instance, “bargaining” refers to any attempt to resolve a dispute over resources — from algorithmic trading and litigation, to diplomacy between national AGI projects and negotiations over norms for space settlement.\n\nTo think clearly about [interventions to mitigate conflict between AIs](https://www.lesswrong.com/posts/YAie7SxrB28ZksLvE/clr-s-safe-pareto-improvements-research-agenda-1), I think it’s important to ground our research and strategy in a very general qualitative model of bargaining with commitments. This post sketches such a model, plus some more concrete examples of its building blocks.\n\nI plan to explain some crucial implications in future writings. But as a teaser, this model *doesn’t* imply agents will play a Nash equilibrium!\n\nWhere does this model come from? Basically, I started with the classic model from [open-source game theory or “program equilibrium” literature](https://en.wikipedia.org/wiki/Program_equilibrium). Then, I relaxed several assumptions to allow for some realistic, strategically relevant dynamics. That said, I’ve glossed over some other important dynamics for ease of exposition. I’ll say more on these at the [end of the post](https://www.lesswrong.com/feed.xml#Commentary_on_the_models_assumptions).\n\nTwo AI agents, Alice and Bob, interact over two phases: before vs. after some time T, defined [below](https://www.lesswrong.com/feed.xml#Bargaining_programs). It will help to start with the “after” phase.\n\n**Bargaining phase (after T):** Alice and Bob bargain over some contested resource. Specifically, “bargaining” consists in credibly reporting to each other their (i) **demands/offers** and (ii) policies for which **outside options** they’d each take if bargaining failed (such as leaving the resource alone, or initiating conflict). Bargaining ends when they either:\n\nHow each agent decides (i) and (ii) is determined by some [procedure](https://www.lesswrong.com/feed.xml#Bargaining_programs) chosen *before* the bargaining phase, as follows.\n\n**Pre-bargaining phase (before T):** Each agent might try to shape the other’s incentives by **credibly committing to constraints** on their procedure for deciding (i) and (ii) — e.g., committing to never accept less than 50%. So they need to decide:\n\nThe agents make these decisions *under uncertainty* about each other’s decisions, though they can resolve some of this uncertainty via (2).\n\nNow for more details on how these commitments to bargaining procedures might work, and on the three actions above.\n\nEach agent’s procedure for what they’ll do in the bargaining phase, called a **program**, takes as input information about the other agent’s program, as well as other features of the strategic situation.\n[3]\n(See (2) below for an example of a relevant “feature of the strategic situation”.)\n\nAs a very simplified example, Alice might follow the program: *“If I can prove that Bob’s program would eventually accept my demands if I stuck to them, then I’ll demand 100%. Otherwise, I’ll accept no less than 50%, and fight if we disagree.”* So, the AIs can implement **conditional commitments**, instead of necessarily either locking in rigid demands or conceding to whoever commits first.\n\nThen, T is the first time both agents know which single program each other has committed to.\n[[4]](https://www.lesswrong.com/feed.xml#fn-zJtyctMTmmGcXLCNq-4)\n\nAt any time t < T, each agent can do one of three actions:\n\n| Part of the model | Examples |\n|---|---|\n| What the AIs bargain over | Allocation of compute among stakeholders; legal settlements; compensation for AIs’\n|\n\nThe model makes the simplifying assumptions listed below. None of these are trivial. But overall, I think it will be fruitful to start by working out the main implications of the model as-is, and relax these assumptions from there.\n\nHowever, we **don’t** assume the agents:\n\nThis last point is worth a closer look. Indeed, I think dropping the equilibrium assumption is one of the most important starting points for a good theory of AI bargaining. But we’ll get to that in another post.\n[[6]](https://www.lesswrong.com/feed.xml#fn-zJtyctMTmmGcXLCNq-6)\n\nEchoing the [safe Pareto improvements agenda post](https://www.lesswrong.com/posts/YAie7SxrB28ZksLvE/clr-s-safe-pareto-improvements-research-agenda-1): “Commitments” are meant to include modifications to one’s decision theory or values/preferences. It has been argued ([example](https://www.lesswrong.com/posts/PcfHSSAMNFMgdqFyB/can-you-control-the-past#IX__What_would_you_have_wanted_yourself_to_commit_to_)) that decision theories like updateless decision theory (UDT) can sidestep the need for “commitments” in the usual sense. We’ll set this question aside here, and treat the resolution to make one’s *future* decisions according to UDT as a commitment in itself. [↩︎](https://www.lesswrong.com/feed.xml#fnref-zJtyctMTmmGcXLCNq-1)\n\nWe define a commitment’s “credibility” relative to the set of agents the commitment needs to be made credible to. In some contexts, agents might want to make commitments that they can’t make credible to others. E.g., they might follow acausal decision theories and expect that if they commit to participate in [evidential cooperation in large worlds (ECL)](https://www.lesswrong.com/posts/eEj9A9yMDgJyk98gm/cooperating-with-aliens-and-distant-agis-an-ecl-explainer), others are more likely to make the same commitment. These commitments are (vacuously) “credible”, because they don’t need to be made credible to anyone else. [↩︎](https://www.lesswrong.com/feed.xml#fnref-zJtyctMTmmGcXLCNq-2)\n\nThis is inspired by the “program game” formalism of [Tennenholtz (2004)](https://ideas.repec.org/a/eee/gamebe/v49y2004i2p363-373.html), but my model *isn’t* committed to the specific assumptions in that paper — most notably, that players choose programs simultaneously. As described in the “Bargaining phase”, we allow for strategic decision-making to be carried out by the program itself, not just by the agent choosing the program. [↩︎](https://www.lesswrong.com/feed.xml#fnref-zJtyctMTmmGcXLCNq-3)\n\nMore generally, we could define each agent’s *subjective* T as the first time after which (a) that agent has decided a single program and (b) they know the other agent’s single program. But as far as I can tell, the implications of the model aren’t sensitive to this. [↩︎](https://www.lesswrong.com/feed.xml#fnref-zJtyctMTmmGcXLCNq-4)\n\nAs a point of contrast, the first five assumptions *are* made by [this paper](https://arxiv.org/abs/2403.05103), which I nonetheless consider an important result in AI bargaining theory. [↩︎](https://www.lesswrong.com/feed.xml#fnref-zJtyctMTmmGcXLCNq-5)\n\nThanks to Nathaniel Sauerberg for helpful comments. [↩︎](https://www.lesswrong.com/feed.xml#fnref-zJtyctMTmmGcXLCNq-6)", "url": "https://wpnews.pro/news/a-high-level-model-of-ai-bargaining", "canonical_source": "https://www.lesswrong.com/posts/kubuos5qprAHeGWyd/a-high-level-model-of-ai-bargaining", "published_at": "2026-06-21 15:37:53+00:00", "updated_at": "2026-06-21 16:05:03.941272+00:00", "lang": "en", "topics": ["ai-safety", "ai-research", "ai-agents"], "entities": ["Alice", "Bob", "Schelling"], "alternates": {"html": "https://wpnews.pro/news/a-high-level-model-of-ai-bargaining", "markdown": "https://wpnews.pro/news/a-high-level-model-of-ai-bargaining.md", "text": "https://wpnews.pro/news/a-high-level-model-of-ai-bargaining.txt", "jsonld": "https://wpnews.pro/news/a-high-level-model-of-ai-bargaining.jsonld"}}