Your LLM prompt doesn't fit? Pack it by priority (zero dependencies)

wpnews.pro

cd /news/developer-tools/your-llm-prompt-doesn-t-fit-pack-it-… · home › topics › developer-tools › article

[ARTICLE · art-28730] src=dev.to ↗ pub=2026-06-15T23:09Z topic=developer-tools verified=true sentiment=↑ positive

Your LLM prompt doesn't fit? Pack it by priority (zero dependencies)

A developer built contextcram, a zero-dependency Python library that prioritizes and packs context pieces (system prompts, chat history, documents) into a fixed token budget for LLM applications. It assigns each piece a priority and a strategy (drop, truncate, trim) to ensure important content is retained while fitting within the model's context window. The library also includes a reserve parameter to prevent the prompt from consuming all tokens needed for the model's reply.

read3 min views20 publishedJun 15, 2026

Every RAG app and agent eventually hits the same wall: you have more stuff than fits in the model's context window — a system prompt, chat history, retrieved documents, tool output — and a fixed token budget.

The usual "fix" is to truncate the whole blob at the end. Which means you randomly chop off whatever happened to be last: sometimes a doc, sometimes half your system prompt. You drop the wrong things.

I got tired of rewriting that logic in every project, so I built ** contextcram** — a tiny, zero-dependency library that treats this as a

Give each piece of context a priority and a strategy for what should happen if it doesn't fit. Set a token budget. contextcram

assembles the largest in-budget context that keeps the important parts.

pip install contextcram
python
from contextcram import Packer

ctx = (
    Packer(budget=8000)
    .add(system_prompt, priority="required")                 # never dropped
    .add(chat_history, priority="high", strategy="trim")     # drop oldest turns
    .add(retrieved_docs, priority="medium", strategy="drop") # all-or-nothing
    .add(tool_output, priority="low", strategy="truncate")   # cut to fit
    .fit()
)

print(ctx.text)          # the assembled, in-budget context
print(ctx.used_tokens)   # e.g. 7840
print(ctx.dropped_names) # what didn't make the cut

When an optional item doesn't fully fit, its strategy

decides what happens:

Strategy	Behavior
`drop`
Include it whole, or not at all
`truncate`
Cut from the end, keep the head (default)
`truncate_head`
Cut from the start, keep the tail
`trim`
For lists (e.g. messages): drop oldest first

required

items are always kept; if they alone blow the budget, you get a clear BudgetExceeded

error instead of a silently mangled prompt.

Two recurring annoyances solved in one line:

from contextcram import Packer

packer = Packer(model="gpt-4o", reserve=2000)
print(packer.full_budget)  # 128000
print(packer.budget)       # 126000  <- what you actually pack into

reserve=

kills the classic "the prompt fit, but there's no room left for the model to answer" bug. Tie it to your max_tokens

and you can't get it wrong.

from contextcram import Packer
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

llm = ChatOpenAI(model="gpt-4o")
docs = [d.page_content for d in retriever.invoke(question)]
history = [f"{m.type}: {m.content}" for m in memory.messages]

ctx = (
    Packer(model="gpt-4o", reserve=1500)
    .add(SYSTEM_PROMPT, priority="required")
    .add(history, priority="high", strategy="trim")
    .add("\n\n".join(docs), priority="medium", strategy="drop")
    .fit()
)

response = llm.invoke([SystemMessage(ctx.text), HumanMessage(question)])

Need exact token counts? Pass tokenizer=tiktoken_tokenizer("gpt-4o")

, or wrap any tokenizer (Hugging Face, llama.cpp) with a one-line CallableTokenizer

. The default is a fast characters-per-token heuristic so there are no required dependencies.

Honest answer: the concept isn't new. Priompt (and its Python port) and Character.AI's Prompt Poet do priority-based context assembly too — and they're more powerful (component models, cache-aware truncation, templating).

contextcram

deliberately trades features for simplicity and zero dependencies:

Packer(...).add(...).fit()

.If you want the smallest possible helper that does one thing — fit prioritized pieces into a budget — this is it.

pip install contextcram

It's MIT, fully typed (mypy --strict

), tested across Python 3.10–3.13. I'd genuinely love feedback on the API and the default strategies — open an issue or drop a comment.

source & further reading

dev.to — original article Publishers Block AI Crawlers, Reshaping News Data Licensing and AI Strategy From AI support copilot to agentic customer service How Local AI Became My 24/7 Python Tutor (Without Doing the Work for Me)

~/api · this article 200

$curl api.wpnews.pro/v1/news/your-llm-prompt-doesn-t-…

Read original on dev.to → dev.to/wael_rahhal_790f328ac4301/your-llm-prompt…

mentioned entities

contextcram

Priompt

Prompt Poet

Character.AI

OpenAI

LangChain

Hugging Face

llama.cpp

metadata

slugyour-llm-prompt-doesn-t-fit-pack-it-by-priority-zero-dependencies

topic#developer-tools

secondary2 topics

sentimentpositive

canonicaldev.to

navigation

← prevFederated learning for predictin…

next →Building M31A: A Terminal-Native…

── more in #developer-tools 4 stories · sorted by recency

theverge.com · 31 Jul · #developer-tools

It’s time to panic about AI safety

dev.to · 31 Jul · #developer-tools

OpenAI Upgrades Auto-review to GPT-5.6 Luna as It Pushes Lower-Cost AI Workflows

dev.to · 31 Jul · #developer-tools

AI-Assisted Code Reviews: Your New Pair Programmer

businessinsider.com · 31 Jul · #developer-tools

The Situational Awareness fiasco has triggered an avalanche of memes

── more on @contextcram 3 stories trending now

wpnews · 30 Jul · #artificial-intelligence

Microsoft and Meta Earnings Show Different AI Spending Pressures

wpnews · 31 Jul · #artificial-intelligence

Rewriting a Six-Year-Old Personal Project with AI

wpnews · 31 Jul · #artificial-intelligence

Microsoft doubles down on multi-model AI as it builds a Copilot super app

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required