# Your LLM prompt doesn't fit? Pack it by priority (zero dependencies)

> Source: <https://dev.to/wael_rahhal_790f328ac4301/your-llm-prompt-doesnt-fit-pack-it-by-priority-zero-dependencies-nhg>
> Published: 2026-06-15 23:09:00+00:00

Every RAG app and agent eventually hits the same wall: you have **more stuff than fits** in the model's context window — a system prompt, chat history, retrieved documents, tool output — and a fixed token budget.

The usual "fix" is to truncate the whole blob at the end. Which means you randomly chop off whatever happened to be last: sometimes a doc, sometimes half your system prompt. You drop the *wrong* things.

I got tired of rewriting that logic in every project, so I built ** contextcram** — a tiny, zero-dependency library that treats this as a

Give each piece of context a **priority** and a **strategy** for what should happen if it doesn't fit. Set a token budget. `contextcram`

assembles the largest in-budget context that keeps the important parts.

```
pip install contextcram
python
from contextcram import Packer

ctx = (
    Packer(budget=8000)
    .add(system_prompt, priority="required")                 # never dropped
    .add(chat_history, priority="high", strategy="trim")     # drop oldest turns
    .add(retrieved_docs, priority="medium", strategy="drop") # all-or-nothing
    .add(tool_output, priority="low", strategy="truncate")   # cut to fit
    .fit()
)

print(ctx.text)          # the assembled, in-budget context
print(ctx.used_tokens)   # e.g. 7840
print(ctx.dropped_names) # what didn't make the cut
```

When an optional item doesn't fully fit, its `strategy`

decides what happens:

| Strategy | Behavior |
|---|---|
`drop` |
Include it whole, or not at all |
`truncate` |
Cut from the end, keep the head (default) |
`truncate_head` |
Cut from the start, keep the tail |
`trim` |
For lists (e.g. messages): drop oldest first |

`required`

items are always kept; if they alone blow the budget, you get a clear `BudgetExceeded`

error instead of a silently mangled prompt.

Two recurring annoyances solved in one line:

``` python
from contextcram import Packer

# Budget pulled from the model; hold back 2k tokens for the reply
packer = Packer(model="gpt-4o", reserve=2000)
print(packer.full_budget)  # 128000
print(packer.budget)       # 126000  <- what you actually pack into
```

`reserve=`

kills the classic *"the prompt fit, but there's no room left for the model to answer"* bug. Tie it to your `max_tokens`

and you can't get it wrong.

``` python
from contextcram import Packer
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

llm = ChatOpenAI(model="gpt-4o")
docs = [d.page_content for d in retriever.invoke(question)]
history = [f"{m.type}: {m.content}" for m in memory.messages]

ctx = (
    Packer(model="gpt-4o", reserve=1500)
    .add(SYSTEM_PROMPT, priority="required")
    .add(history, priority="high", strategy="trim")
    .add("\n\n".join(docs), priority="medium", strategy="drop")
    .fit()
)

response = llm.invoke([SystemMessage(ctx.text), HumanMessage(question)])
```

Need exact token counts? Pass `tokenizer=tiktoken_tokenizer("gpt-4o")`

, or wrap any tokenizer (Hugging Face, llama.cpp) with a one-line `CallableTokenizer`

. The default is a fast characters-per-token heuristic so there are **no required dependencies**.

Honest answer: the *concept* isn't new. [Priompt](https://github.com/anysphere/priompt) (and its Python port) and Character.AI's [Prompt Poet](https://pypi.org/project/prompt-poet/) do priority-based context assembly too — and they're more powerful (component models, cache-aware truncation, templating).

`contextcram`

deliberately trades features for **simplicity and zero dependencies**:

`Packer(...).add(...).fit()`

.If you want the smallest possible helper that does one thing — fit prioritized pieces into a budget — this is it.

```
pip install contextcram
```

It's MIT, fully typed (`mypy --strict`

), tested across Python 3.10–3.13. I'd genuinely love feedback on the API and the default strategies — open an issue or drop a comment.