Graph of Thoughts: when a tree of reasoning isn't enough, let the branches merge

A developer introduced Graph of Thoughts (GoT), a reasoning framework that extends Tree of Thoughts by allowing branches to merge. GoT represents reasoning as a graph where thoughts can have multiple parents, enabling aggregation of partial solutions and refinement loops. An interactive demo shows GoT solving a merge-sort problem, achieving 100% sortedness versus 66% for the best single branch.

Tree of Thoughts was a genuine leap. Instead of reasoning in one straight line, it branches into several lines, scores them, prunes the dead ends, and searches for the best path — so a puzzle that would sink a single chain of thought becomes solvable. But a tree has one restriction baked right into its shape, and once you see it you can't unsee it: every node has exactly one parent. A branch can be extended or abandoned. It can never be combined with another branch. That matters more than it sounds. Real problems decompose, and when they do, different branches each get part of the answer right. Branch A nails the first half; branch B nails the second half; neither is fully correct on its own. A tree is forced to pick one and throw the other's good half away. Graph of Thoughts GoT removes exactly that restriction. 🕸️ Interactive demo a real merge-sort that branches, merges, and refines — with live-verified scores : https://dev48v.infy.uk/prompt/day21-graph-of-thoughts.html https://dev48v.infy.uk/prompt/day21-graph-of-thoughts.html GoT reframes reasoning as building a graph . Each vertex is a thought — a partial solution or intermediate result. Each edge is an operation that produced one thought from one or more others. Because it's a general graph and not a tree, a thought is allowed to have several parents, and edges can even loop back on themselves. That single change in the data structure is the entire conceptual leap. Everything else is just the operations you're now free to run on that network. generate branch — the familiar move, straight from Tree of Thoughts. From one thought, produce several different next thoughts. This can also be a split : break the input into independent sub-problems solved on separate branches. Diversity matters here — near-duplicates waste budget. score / rank — turn each thought into a number so the controller can compare them. Objective scorers win: a validator, a test, a metric. In the demo, the scorer is deliberately concrete — the sortedness of a list, the fraction of adjacent pairs already in the right order, where 1.0 means perfectly sorted. aggregate / merge — the move a tree cannot make, and the reason GoT exists. Take several thoughts and combine them into one new thought that keeps the good parts of each. Concretely it's a node with more than one parent: two sorted halves merged into one list, three partial answers reconciled into one, several rankings fused into a single shortlist. With an LLM the prompt is literally "here are N partial answers; write one that keeps the best of each and fixes their mistakes." refine / loop — a self-loop edge: feed a thought back into the model to improve it, then repeat until the score stops climbing. Merging two partial solutions often leaves a small imperfection at the seam, and a refine loop cleans it up. Loops are the second thing a tree can't represent. A branching search can only ever return the best single branch. Its ceiling is whichever line of reasoning happened to be strongest. Aggregation lifts that ceiling, because the merged thought is assembled from the best pieces of several branches — so it can outscore every branch it was built from. The demo proves this with real code, not a script. Sort 8, 3, 5, 1, 9, 2, 7, 4 . Split it into two halves, and each half-branch sorts almost correctly but leaves one number out of place: php branch A: 3, 5, 1, 8 - 66% sorted branch B: 2, 7, 4, 9 - 66% sorted best single branch = 66% a tree stops here Now the graph move. Aggregate the two branches with a real merge, then run a refine loop of adjacent-swap sweeps: php merge A, B - 2, 3, 5, 1, 7, 4, 8, 9 - 71% refine x3 - 1, 2, 3, 4, 5, 7, 8, 9 - 100% fully sorted +34 points over the best branch — and the numbers are computed live by actual merge and refine functions, so the improvement is genuine, not narrated. No single branch ever had all the good pieces; the merge is what assembles them. The operations don't run themselves. A controller executes a plan called the Graph of Operations GoO : split the input, generate candidates, score them, aggregate the survivors, refine the merge, stop. The plan is data , not hard-coded control flow, so the same generate/score/aggregate/refine primitives get recomposed into different pipelines — sorting, multi-document summarisation, keyword merging, reconciling facts from several sources. async function got problem { const a, b = split problem ; // generate / branch const left = refine await sortBranch a ; const right = refine await sortBranch b ; const merged = aggregate left, right ; // <- the graph move return refine merged ; // polish the merge } Every added capability costs model calls. GoT calls the LLM to generate, to score, to aggregate, and repeatedly to refine — markedly more than a chain's single pass or a tree's per-level fan-out. So the budget dials come back: branch factor, top-k to keep per stage, refine loops, a hard node cap. Keep only the best thoughts at each stage and stop the moment one clears the bar. The honest rule: chain < tree < graph — use the least powerful structure that fits. Reach for GoT only when the task genuinely decomposes and partial results should merge into a better whole. When that's the shape of the problem, the extra tokens buy you an answer no branch could reach alone. Poke both tasks in the demo and watch the merge node light up with two parents: https://dev48v.infy.uk/prompt/day21-graph-of-thoughts.html https://dev48v.infy.uk/prompt/day21-graph-of-thoughts.html Tomorrow: Skeleton-of-Thought — draft a skeleton of the answer first, then expand every point in parallel.