Feeding the LLM the Whole Thread: Prompt Chaining for Contextual Replies

HelperX developed a prompt-chaining strategy that feeds LLMs the full conversation thread, not just a single tweet, to generate contextually relevant replies. By reconstructing up to three parent tweets and sibling replies, the system compresses thread data into a readable format with a [REPLY TO THIS] marker, improving reply quality significantly over isolated tweet responses.

I wrote earlier about our persona engine https://helperx.app/blog/persona-engine — how we make one LLM sound like many different people. But a great persona replying to a single tweet in isolation still produces mediocre replies, because a tweet is almost never the whole story. The tweet is one node in a conversation. The best replies reference what came before, anticipate what comes after, and match the register of the thread they're joining. A human reads the thread before replying. An LLM, by default, sees only the tweet you hand it. This article is about how we feed the LLM the context it needs — the surrounding thread, the author's prior messages, the conversation's tone — and the prompt-chaining strategy that turns isolated tweets into reply-ready context. Consider this tweet: "Finally shipped it. Took 6 months." Reply A no context : "Congratulations on shipping Six months of hard work paying off must feel amazing. 🎉" Reply B with thread context — the prior tweet was "our team has been stuck in QA hell for 6 months on this release" : "Escaping QA hell is its own kind of ship. What finally unblocked it — the test suite or the process?" Reply B is dramatically better. It references the prior context "QA hell" , advances the conversation with a specific question, and drops the hollow congratulatory filler. Same LLM, same persona. The difference is entirely the context provided. The persona engine solves "sound like a specific person." Context chaining solves "say something that fits the specific conversation." You need both. When replying to tweet T, the ideal context includes: That's a lot. Naively, dumping all of it into every prompt would blow the token budget and dilute the signal. The art is in selecting which context to include for which tweets, and compressing it to fit. X's API gives us a tweet and sometimes its parent. To reconstruct a full thread, we walk the parent chain upward and fetch replies downward, building a small graph: js async function buildThreadContext tweetId, depth = 3 { const context = { target: null, parents: , // tweets above the target walking up siblings: , // other replies to the same parent thread: , // the linear chain root - ... - target }; // Walk up the parent chain let current = await fetchTweet tweetId ; context.target = current; for let i = 0; i < depth && current.in reply to; i++ { const parent = await fetchTweet current.in reply to ; context.parents.unshift parent ; // oldest first current = parent; } // Reconstruct the linear thread: root - ... - target context.thread = ...context.parents, context.target ; // Fetch sibling replies to the immediate parent the conversation around us if context.target.in reply to { context.siblings = await fetchReplies context.target.in reply to, { limit: 5 } ; } return context; } The depth limit 3 matters: walking an entire 50-tweet thread would be expensive and would mostly add noise. Three levels up captures the conversational arc without over-fetching. Raw thread data is verbose. Tweets come with metadata, IDs, timestamps — most of it irrelevant to "what should I say?" We compress to the essentials before prompt-building: js function compressThread context { return context.thread.map t = { author: t.author.handle, text: t.text, isTarget: t.id === context.target.id, } ; } And we render it as a readable conversation, oldest-first, so the LLM reads it as a natural sequence: js function renderThreadPrompt compressed { const lines = compressed.map t = { const marker = t.isTarget ? ' REPLY TO THIS ' : ''; return ${t.author}:${marker} ${t.text} ; } ; return Here is the conversation so far oldest first :\n\n${lines.join '\n\n' } ; } The REPLY TO THIS marker is crucial. Without it, the LLM doesn't know which tweet in the thread it's supposed to be responding to. The marker anchors its generation to the right target. The context block joins the persona prompt from the persona engine to form the full system+user prompt: js function buildContextualPrompt persona, context { const compressed = compressThread context ; const threadBlock = renderThreadPrompt compressed ; return persona.systemPrompt, // from the persona engine '', threadBlock, '', 'Write a reply to the tweet marked REPLY TO THIS . ' + 'Reference the conversation naturally — do not repeat earlier messages verbatim, ' + 'and make sure your reply makes sense given what came before.', .join '\n' ; } The instruction "reference the conversation naturally" plus "do not repeat earlier messages verbatim" steers the LLM toward Reply B contextual and away from Reply A hollow and from the failure mode of parroting the parent tweet back. Here's the tension: richer context produces better replies, but richer context costs more tokens, and tokens cost money and latency . A 5-tweet thread with full metadata can be 1,500+ tokens; across hundreds of replies a day, that's real cost. We manage this with adaptive context depth — include more context when it's likely to matter, less when it doesn't: function selectContextDepth context, persona { // Short standalone tweet no parent — minimal context needed if context.parents.length === 0 return 'minimal'; // Tweet is part of a thread — full context matters if context.thread.length 2 return 'full'; // Controversial/technical persona — needs more context to be substantive if persona.assertiveness = 4 return 'full'; return 'standard'; } This roughly halves our average token usage compared to "always include full context," with no measurable quality drop — because most tweets don't need full context, and the ones that do get it. A subtle failure mode: an account replies to a tweet, then later replies again in the same thread, and the two replies contradict each other. To a human reader, this screams "automated — it doesn't remember what it said." We prevent this by including our own prior replies in the context: async function getOurPriorReplies slotId, threadTweetIds { // Look up any replies we've already sent in this thread const priorReplies = await db.getRepliesByUs slotId, threadTweetIds ; if priorReplies.length === 0 return null; return Note: you have already replied in this thread earlier:\n + priorReplies.map r = - "${r.text}" .join '\n' + \nDo not contradict your earlier position. Build on it or extend it. ; } This block is prepended to the prompt only when we have prior replies in the thread. The LLM now "knows" what it already said and maintains a consistent stance across multiple engagements in the same conversation — the continuity that makes an account feel like a coherent person rather than a stateless generator. A thread full of jokes doesn't want a serious reply; a technical thread doesn't want a casual one. We add a lightweight tone hint derived from the context: js function inferToneHint context { const texts = context.thread.map t = t.text .join ' ' ; const hasQuestions = texts.match /\?/g || .length; const hasEmojis = texts.match /\p{Emoji}/gu || .length; const avgLength = texts.length / context.thread.length; if hasEmojis context.thread.length 0.5 return 'casual, emoji-friendly'; if avgLength 200 && hasQuestions 1 return 'thoughtful, discussion-oriented'; if avgLength < 80 return 'punchy, terse'; return null; // no strong signal, don't override persona tone } When a tone hint is returned, it modulates the persona's tone rather than replacing it — a casual persona in a serious thread dials up slightly, not fully serious. The hint nudges; the persona anchors. We compared reply quality via manual review of 500 replies, scored blind across three configurations: | Configuration | Avg. quality 1-5 | "Contextual" replies | |---|---|---| | Target tweet only no context | 2.4 | 11% | | Target + immediate parent | 3.6 | 47% | | Full thread + siblings + our prior replies | 4.3 | 78% | The jump from "parent only" to "full context" is smaller than the jump from "no context" to "parent" — diminishing returns — but it's still significant, and the "contextual" rate replies that clearly engage the conversation rather than reacting to the tweet in isolation nearly doubles. For an account whose entire value is "this reply felt real," that doubling is the difference between an account that grows and one that gets scrolled past. 1. Context is as important as persona. A great persona with no context produces hollow replies. A mediocre persona with great context produces relevant ones. You need both, and most AI-reply systems invest only in persona. 2. Mark the target explicitly. When you hand an LLM a thread, tell it which tweet to reply to. Without an explicit marker, it drifts. 3. Adapt context depth to the situation. Full context for every reply wastes tokens; minimal context for every reply sacrifices quality. Adaptive depth gets the quality where it matters at sustainable cost. 4. Include your own prior replies. Statelessness is the LLM's biggest tell in ongoing threads. Feeding back what you already said maintains coherence and prevents contradictions. 5. Compress before prompting. Raw tweet objects are mostly noise. Compress to author + text + target marker before the LLM ever sees it. 6. Tone-match, don't tone-replace. Let the conversation's tone modulate the persona, not override it. The persona is the constant; the context is the variable. The prompt-chaining layer is what separates AI replies that feel like a stranger parachuting into a conversation from AI replies that feel like a regular participant who's been following along. Both are "AI-generated." Only one grows an account. HelperX https://helperx.app generates replies with full thread context, persona anchoring, and self-continuity — so every reply fits the conversation it joins. Free 30-day trial.