Branch Agent: Git-Style Branching for LLM Conversations A developer built Branch Agent, a system that applies Git-style branching to LLM conversations, enabling users to fork, branch, and merge AI conversations with different models, prompts, and providers in each parallel timeline. The system uses Convex for database operations and Agno for agent management, with O(1) storage for forks via pointer references. Fork, branch, and merge AI conversations like code — with different models, prompts, and providers in each parallel timeline. When experimenting with LLMs, you've probably done this: tweak the system prompt, rerun the conversation, compare outputs side-by-side in two browser tabs, manually copy-paste the better response, and repeat. It's messy, non-reproducible, and there's no version history. What if LLM conversations had the same branching model as Git? A conversation is a tree , not a list. Every message can be a fork point. New branches inherit the full context up to that point via pointer references O 1 , no copying . Each branch gets its own agent configuration — system prompt, model, provider, temperature, tools. Branches can be compared side-by-side and merged back via an AI "Judge Agent." ┌──────────────┐ Convex hooks ┌──────────────────┐ │ Next.js UI │ ◄─────────────────────► │ Convex Database │ │ React 19 │ useQuery/mutation │ workspaces, │ └──────┬───────┘ │ branches, msgs │ │ HTTP POST /chat SSE stream └──────────────────┘ ▼ ┌──────────────────┐ │ Python FastAPI │ ← Agno Agent SDK │ Agno Service │ Creates agent per request with branch config └──────┬───────────┘ │ OpenAI-compatible API ▼ ┌──────────────────┐ │ Any LLM Provider│ ← OpenAI, Together, Groq, Ollama... └──────────────────┘ Convex provides: forkBranch is atomic and isolated chatWithAgent action runs outside the transaction but can call internal queries/mutations safelyAgno is a Python-native agent framework that supports: The schema is deliberately relational to support tree traversal: js // convex/schema.ts export const agentConfigSchema = v.object { systemPrompt: v.optional v.string , model: v.optional v.string , baseUrl: v.optional v.string , // per-branch provider URL apiKey: v.optional v.string , // per-branch API key tools: v.optional v.array v.string , temperature: v.optional v.number , maxTokens: v.optional v.number , } ; export const branches = defineTable { workspaceId: v.id "workspaces" , name: v.string , parentBranchId: v.optional v.id "branches" , snapshotMessageId: v.optional v.id "messages" , agentConfig: v.optional agentConfigSchema , isMerged: v.optional v.boolean , mergeSummary: v.optional v.string , } .index "by workspace", "workspaceId" .index "by parent", "parentBranchId" ; export const messages = defineTable { branchId: v.id "branches" , parentMessageId: v.optional v.id "messages" , role: v.union v.literal "user" , v.literal "assistant" , v.literal "system" , content: v.string , metadata: v.optional messageMetadataSchema , } .index "by branch created", "branchId", "createdAt" ; The key insight: snapshotMessageId on a branch points to the message where it forked. History reconstruction walks parentBranchId → snapshotMessageId pointers recursively. This makes forks O 1 in storage — no message duplication. js export const forkBranch = mutation { args: { sourceBranchId: v.id "branches" , snapshotMessageId: v.id "messages" , newBranchName: v.string , agentConfig: v.optional agentConfigSchema , }, handler: async ctx, args = { // Just create a new branch with pointer references // No messages are copied return await ctx.db.insert "branches", { workspaceId: sourceBranch.workspaceId, name: args.newBranchName, parentBranchId: args.sourceBranchId, snapshotMessageId: args.snapshotMessageId, agentConfig: args.agentConfig ?? sourceBranch.agentConfig, createdAt: Date.now , } ; }, } ; When a user sends a message, the action fetches the full context by walking the branch tree: js export const internalGetBranchHistory = internalQuery { handler: async ctx, args = { const branch = await ctx.db.get args.branchId ; const myMessages = await ctx.db .query "messages" .withIndex "by branch created", q = q.eq "branchId", args.branchId .order "asc" .collect ; if branch.parentBranchId || branch.snapshotMessageId { return myMessages; // root branch, just our messages } // Walk up the parent tree to the snapshot point const parentMessages = await traverseToSnapshot ctx, branch.parentBranchId, branch.snapshotMessageId ; return ...parentMessages, ...myMessages ; }, } ; The chatWithAgent action sends the full history to the Python Agno service, which streams tokens back via SSE: js // Convex action reads branch config, sends to Agno service const agnoPayload = { messages: conversationMessages, system prompt: branch.agentConfig?.systemPrompt, model: branch.agentConfig?.model, base url: branch.agentConfig?.baseUrl, api key: branch.agentConfig?.apiKey, tools: branch.agentConfig?.tools, temperature: branch.agentConfig?.temperature, stream: true, }; // Parse SSE events and update message content in real-time for await const sseEvent of sseReader { if parsed.type === "content" && parsed.content { fullContent += parsed.content; await ctx.runMutation internalUpdateMessageStream, { messageId: assistantMessageId, content: fullContent, } ; } } Each token delta updates the Convex document, which triggers the reactive useQuery hook on the frontend — the UI streams the response smoothly. python agno service/agent handler.py def create agent system prompt: str = None, model name: str = None, base url: str = None, api key: str = None, tool names: list str = None, temperature: float = None, max tokens: int = None, - Agent: model = resolve model model name or AGNO DEFAULT MODEL, temperature, max tokens, base url=base url, api key=api key, tools = resolve tools tool names or return Agent model=model, tools=tools or None, instructions= system prompt if system prompt else None, The service creates a fresh Agent per request — no state leakage between branches. Each branch can point to a completely different provider. The compare view lets you see two branches at the same time. This is particularly useful when testing different system prompts or models against the same conversation history: