Your LLM prompt doesn't fit? Pack it by priority (zero dependencies)

A developer built contextcram, a zero-dependency Python library that prioritizes and packs context pieces (system prompts, chat history, documents) into a fixed token budget for LLM applications. It assigns each piece a priority and a strategy (drop, truncate, trim) to ensure important content is retained while fitting within the model's context window. The library also includes a reserve parameter to prevent the prompt from consuming all tokens needed for the model's reply.

Every RAG app and agent eventually hits the same wall: you have more stuff than fits in the model's context window — a system prompt, chat history, retrieved documents, tool output — and a fixed token budget. The usual "fix" is to truncate the whole blob at the end. Which means you randomly chop off whatever happened to be last: sometimes a doc, sometimes half your system prompt. You drop the wrong things. I got tired of rewriting that logic in every project, so I built contextcram — a tiny, zero-dependency library that treats this as a Give each piece of context a priority and a strategy for what should happen if it doesn't fit. Set a token budget. contextcram assembles the largest in-budget context that keeps the important parts. pip install contextcram python from contextcram import Packer ctx = Packer budget=8000 .add system prompt, priority="required" never dropped .add chat history, priority="high", strategy="trim" drop oldest turns .add retrieved docs, priority="medium", strategy="drop" all-or-nothing .add tool output, priority="low", strategy="truncate" cut to fit .fit print ctx.text the assembled, in-budget context print ctx.used tokens e.g. 7840 print ctx.dropped names what didn't make the cut When an optional item doesn't fully fit, its strategy decides what happens: | Strategy | Behavior | |---|---| drop | Include it whole, or not at all | truncate | Cut from the end, keep the head default | truncate head | Cut from the start, keep the tail | trim | For lists e.g. messages : drop oldest first | required items are always kept; if they alone blow the budget, you get a clear BudgetExceeded error instead of a silently mangled prompt. Two recurring annoyances solved in one line: python from contextcram import Packer Budget pulled from the model; hold back 2k tokens for the reply packer = Packer model="gpt-4o", reserve=2000 print packer.full budget 128000 print packer.budget 126000 <- what you actually pack into reserve= kills the classic "the prompt fit, but there's no room left for the model to answer" bug. Tie it to your max tokens and you can't get it wrong. python from contextcram import Packer from langchain openai import ChatOpenAI from langchain core.messages import SystemMessage, HumanMessage llm = ChatOpenAI model="gpt-4o" docs = d.page content for d in retriever.invoke question history = f"{m.type}: {m.content}" for m in memory.messages ctx = Packer model="gpt-4o", reserve=1500 .add SYSTEM PROMPT, priority="required" .add history, priority="high", strategy="trim" .add "\n\n".join docs , priority="medium", strategy="drop" .fit response = llm.invoke SystemMessage ctx.text , HumanMessage question Need exact token counts? Pass tokenizer=tiktoken tokenizer "gpt-4o" , or wrap any tokenizer Hugging Face, llama.cpp with a one-line CallableTokenizer . The default is a fast characters-per-token heuristic so there are no required dependencies . Honest answer: the concept isn't new. Priompt https://github.com/anysphere/priompt and its Python port and Character.AI's Prompt Poet https://pypi.org/project/prompt-poet/ do priority-based context assembly too — and they're more powerful component models, cache-aware truncation, templating . contextcram deliberately trades features for simplicity and zero dependencies : Packer ... .add ... .fit .If you want the smallest possible helper that does one thing — fit prioritized pieces into a budget — this is it. pip install contextcram It's MIT, fully typed mypy --strict , tested across Python 3.10–3.13. I'd genuinely love feedback on the API and the default strategies — open an issue or drop a comment.