I wrote earlier about our persona engine β how we make one LLM sound like many different people. But a great persona replying to a single tweet in isolation still produces mediocre replies, because a tweet is almost never the whole story. The tweet is one node in a conversation. The best replies reference what came before, anticipate what comes after, and match the register of the thread they're joining.
A human reads the thread before replying. An LLM, by default, sees only the tweet you hand it. This article is about how we feed the LLM the context it needs β the surrounding thread, the author's prior messages, the conversation's tone β and the prompt-chaining strategy that turns isolated tweets into reply-ready context.
Consider this tweet:
"Finally shipped it. Took 6 months."
Reply A (no context):
"Congratulations on shipping! Six months of hard work paying off must feel amazing. π"
Reply B (with thread context β the prior tweet was "our team has been stuck in QA hell for 6 months on this release"):
"Escaping QA hell is its own kind of ship. What finally unblocked it β the test suite or the process?"
Reply B is dramatically better. It references the prior context ("QA hell"), advances the conversation with a specific question, and drops the hollow congratulatory filler. Same LLM, same persona. The difference is entirely the context provided.
The persona engine solves "sound like a specific person." Context chaining solves "say something that fits the specific conversation." You need both.
When replying to tweet T, the ideal context includes:
That's a lot. Naively, dumping all of it into every prompt would blow the token budget and dilute the signal. The art is in selecting which context to include for which tweets, and compressing it to fit.
X's API gives us a tweet and (sometimes) its parent. To reconstruct a full thread, we walk the parent chain upward and fetch replies downward, building a small graph:
async function buildThreadContext(tweetId, depth = 3) {
const context = {
target: null,
parents: [], // tweets above the target (walking up)
siblings: [], // other replies to the same parent
thread: [], // the linear chain root -> ... -> target
};
// Walk up the parent chain
let current = await fetchTweet(tweetId);
context.target = current;
for (let i = 0; i < depth && current.in_reply_to; i++) {
const parent = await fetchTweet(current.in_reply_to);
context.parents.unshift(parent); // oldest first
current = parent;
}
// Reconstruct the linear thread: root -> ... -> target
context.thread = [...context.parents, context.target];
// Fetch sibling replies to the immediate parent (the conversation around us)
if (context.target.in_reply_to) {
context.siblings = await fetchReplies(context.target.in_reply_to, { limit: 5 });
}
return context;
}
The depth limit (3) matters: walking an entire 50-tweet thread would be expensive and would mostly add noise. Three levels up captures the conversational arc without over-fetching.
Raw thread data is verbose. Tweets come with metadata, IDs, timestamps β most of it irrelevant to "what should I say?" We compress to the essentials before prompt-building:
function compressThread(context) {
return context.thread.map(t => ({
author: t.author.handle,
text: t.text,
isTarget: t.id === context.target.id,
}));
}
And we render it as a readable conversation, oldest-first, so the LLM reads it as a natural sequence:
function renderThreadPrompt(compressed) {
const lines = compressed.map(t => {
const marker = t.isTarget ? ' [REPLY TO THIS]' : '';
return `${t.author}:${marker} ${t.text}`;
});
return `Here is the conversation so far (oldest first):\n\n${lines.join('\n\n')}`;
}
The [REPLY TO THIS]
marker is crucial. Without it, the LLM doesn't know which tweet in the thread it's supposed to be responding to. The marker anchors its generation to the right target.
The context block joins the persona prompt (from the persona engine) to form the full system+user prompt:
function buildContextualPrompt(persona, context) {
const compressed = compressThread(context);
const threadBlock = renderThreadPrompt(compressed);
return [
persona.systemPrompt, // from the persona engine
'',
threadBlock,
'',
'Write a reply to the tweet marked [REPLY TO THIS]. ' +
'Reference the conversation naturally β do not repeat earlier messages verbatim, ' +
'and make sure your reply makes sense given what came before.',
].join('\n');
}
The instruction "reference the conversation naturally" plus "do not repeat earlier messages verbatim" steers the LLM toward Reply B (contextual) and away from Reply A (hollow) and from the failure mode of parroting the parent tweet back.
Here's the tension: richer context produces better replies, but richer context costs more tokens, and tokens cost money (and latency). A 5-tweet thread with full metadata can be 1,500+ tokens; across hundreds of replies a day, that's real cost.
We manage this with adaptive context depth β include more context when it's likely to matter, less when it doesn't:
function selectContextDepth(context, persona) {
// Short standalone tweet (no parent) β minimal context needed
if (context.parents.length === 0) return 'minimal';
// Tweet is part of a thread β full context matters
if (context.thread.length > 2) return 'full';
// Controversial/technical persona β needs more context to be substantive
if (persona.assertiveness >= 4) return 'full';
return 'standard';
}
This roughly halves our average token usage compared to "always include full context," with no measurable quality drop β because most tweets don't need full context, and the ones that do get it.
A subtle failure mode: an account replies to a tweet, then later replies again in the same thread, and the two replies contradict each other. To a human reader, this screams "automated β it doesn't remember what it said."
We prevent this by including our own prior replies in the context:
async function getOurPriorReplies(slotId, threadTweetIds) {
// Look up any replies we've already sent in this thread
const priorReplies = await db.getRepliesByUs(slotId, threadTweetIds);
if (priorReplies.length === 0) return null;
return `Note: you have already replied in this thread earlier:\n` +
priorReplies.map(r => `- "${r.text}"`).join('\n') +
`\nDo not contradict your earlier position. Build on it or extend it.`;
}
This block is prepended to the prompt only when we have prior replies in the thread. The LLM now "knows" what it already said and maintains a consistent stance across multiple engagements in the same conversation β the continuity that makes an account feel like a coherent person rather than a stateless generator.
A thread full of jokes doesn't want a serious reply; a technical thread doesn't want a casual one. We add a lightweight tone hint derived from the context:
function inferToneHint(context) {
const texts = context.thread.map(t => t.text).join(' ');
const hasQuestions = (texts.match(/\?/g) || []).length;
const hasEmojis = (texts.match(/\p{Emoji}/gu) || []).length;
const avgLength = texts.length / context.thread.length;
if (hasEmojis > context.thread.length * 0.5) return 'casual, emoji-friendly';
if (avgLength > 200 && hasQuestions > 1) return 'thoughtful, discussion-oriented';
if (avgLength < 80) return 'punchy, terse';
return null; // no strong signal, don't override persona tone
}
When a tone hint is returned, it modulates the persona's tone rather than replacing it β a casual persona in a serious thread dials up slightly, not fully serious. The hint nudges; the persona anchors.
We compared reply quality (via manual review of 500 replies, scored blind) across three configurations:
| Configuration | Avg. quality (1-5) | "Contextual" replies |
|---|---|---|
| Target tweet only (no context) | 2.4 | 11% |
| Target + immediate parent | 3.6 | 47% |
| Full thread + siblings + our prior replies | 4.3 | |
| 78% |
The jump from "parent only" to "full context" is smaller than the jump from "no context" to "parent" β diminishing returns β but it's still significant, and the "contextual" rate (replies that clearly engage the conversation rather than reacting to the tweet in isolation) nearly doubles. For an account whose entire value is "this reply felt real," that doubling is the difference between an account that grows and one that gets scrolled past.
1. Context is as important as persona. A great persona with no context produces hollow replies. A mediocre persona with great context produces relevant ones. You need both, and most AI-reply systems invest only in persona.
2. Mark the target explicitly. When you hand an LLM a thread, tell it which tweet to reply to. Without an explicit marker, it drifts.
3. Adapt context depth to the situation. Full context for every reply wastes tokens; minimal context for every reply sacrifices quality. Adaptive depth gets the quality where it matters at sustainable cost.
4. Include your own prior replies. Statelessness is the LLM's biggest tell in ongoing threads. Feeding back what you already said maintains coherence and prevents contradictions.
5. Compress before prompting. Raw tweet objects are mostly noise. Compress to author + text + target marker before the LLM ever sees it.
6. Tone-match, don't tone-replace. Let the conversation's tone modulate the persona, not override it. The persona is the constant; the context is the variable.
The prompt-chaining layer is what separates AI replies that feel like a stranger parachuting into a conversation from AI replies that feel like a regular participant who's been following along. Both are "AI-generated." Only one grows an account.
HelperX generates replies with full thread context, persona anchoring, and self-continuity β so every reply fits the conversation it joins. Free 30-day trial.