{"slug": "knowa-open-source-llm-context-optimizer", "title": "Knowa – Open-Source LLM Context Optimizer", "summary": "Knowa has released an open-source hybrid retrieval library and knowledge base server that reduces LLM context costs by 90-99% by indexing documents as vector chunks, full-text pages, and a named-entity graph, then extracting only relevant chunks per query. The tool, which ingests from local directories, Notion, and Confluence, aims to solve the problem of sending millions of tokens per request at high cost — a 1,000-page knowledge base can cost tens of thousands of dollars monthly at scale. Knowa's precision retrieval approach, which includes optional LLM entity enrichment for domain-specific terms, is positioned as critical for scaling AI from prototype to company-wide infrastructure without breaking budgets.", "body_md": "The naive approach to building AI-powered apps is to load your documents into the prompt and let the LLM figure it out. It works in demos. It breaks in production.\n\nA 1,000-page knowledge base is roughly 2–4 million tokens. At current API pricing, sending that on every request costs dollars per query. At 10,000 queries a day that is tens of thousands of dollars a month — for context that is 95% irrelevant to the question being asked. As AI usage scales across teams and products, this becomes the dominant cost line.\n\n**Knowa's core job is to solve this.** It indexes your documents once — as vector chunks, full-text\npages, and a named-entity graph — then for each question extracts only the handful of chunks\nthat are actually relevant, typically 1,000–3,000 tokens out of a large corpus. At scale that is\na 90–99% reduction in context sent to the LLM, with no loss in answer quality. Every query response\nincludes a measured token savings figure so you can track this in production. Savings are\ncorpus-size dependent — see [Understanding token savings](#understanding-token-savings) for\nwhat to expect at different scales.\n\nThis matters now and will matter much more as AI usage goes from prototype to product to company-wide infrastructure. Precision retrieval is not a nice-to-have — it is the difference between an AI feature that scales and one that breaks your budget.\n\nKnowa is a hybrid retrieval library and knowledge base server. It ingests documents from local directories, Notion, Confluence, and any custom source you connect — storing them as vector chunks and full-text pages in PostgreSQL, and as a named-entity knowledge graph in PostgreSQL (default), Neo4j, or Kuzu. For each question it retrieves only the most relevant context across all three representations and returns it ready to inject into any LLM or AI pipeline you choose.\n\nThe knowledge graph is built during indexing and gives retrieval a structural dimension that pure vector search misses — connecting people, products, organisations, and concepts across your entire document corpus so entity-centric questions (\"what teams work on X?\", \"which pages mention both Y and Z?\") get precise, targeted answers.\n\n**How the graph is populated** depends on how much coverage you need:\n\n-\n**spaCy NER (default)**— runs locally at indexing time with zero API cost. Recognises standard entity types (people, organisations, locations, dates, and more). You can swap the model to improve accuracy or target a specific domain:`en_core_web_trf`\n\n(transformer-based, better on ambiguous names), scispaCy models for scientific or biomedical text, or any spaCy-compatible model. -\n**LLM entity enrichment (opt-in)**— an additive second pass using any OpenAI-compatible model (gpt-4o-mini, Qwen3, Kimi2, and others). The LLM extracts domain-specific entities that a general NER model misses: product features, internal codenames, technical standards, abstract concepts — anything with meaning in your organisation's language. Runs concurrently across pages to keep indexing fast at scale, and costs roughly $0.30 per 1,000 pages with gpt-4o-mini.\n\n- Python 3.11+\n[Docker](https://docs.docker.com/get-docker/)(for PostgreSQL)- OpenAI API key\n\n```\ndocker compose up -d\n```\n\nThis starts a PostgreSQL 16 instance with pgvector pre-installed, exposed on `localhost:5432`\n\n.\nData is persisted in a Docker volume across restarts.\n\n```\npython -m venv .venv\nsource .venv/bin/activate        # Windows: .venv\\Scripts\\activate\n```\n\nOR - if you have miniconda installed\n\n```\nconda create -n knowa python=3.12\nconda activate knowa\npip install -r requirements.txt\npython -m spacy download en_core_web_sm\npip install -e .                 # registers the `knowa` CLI command\ncp .env.example .env\n```\n\nEdit `.env`\n\nand fill in the required values:\n\n| Variable | Required | Notes |\n|---|---|---|\n`DATABASE_URL` |\nYes | Pre-filled to match `docker compose` — no change needed |\n`OPENAI_API_KEY` |\nYes | Your OpenAI API key |\n`OPENAI_MODEL` |\nYes | e.g. `gpt-5.4` |\n`API_KEY` |\nYes | Any random secret — used to protect the REST API |\n`NOTION_API_KEY` |\nNo | Required only for Notion sources |\n`CONFLUENCE_*` |\nNo | All four vars required for Confluence sources |\n`SPACY_MODEL` |\nNo | NER model for graph extraction (default: `en_core_web_sm` ) |\n`ENTITY_LLM_MODEL` |\nNo | Enable LLM entity enrichment, e.g. `gpt-4o-mini` (see below) |\n\nGenerate a strong API key:\n\n``` python\npython3 -c \"import secrets; print(secrets.token_urlsafe(32))\"\nknowa index /path/to/docs --name \"My Docs\"\n```\n\nMigrations run automatically on the first command. Supported file types: `.md`\n\n, `.txt`\n\n, `.pdf`\n\n, `.docx`\n\n```\nknowa chat \"What would you like to know?\"\nuvicorn knowa.api.main:app --reload --port 8000\n```\n\nOpen `http://localhost:8000/admin/ui`\n\n— search, browse sources, and trigger rebuilds from the browser.\n\n**As a Python library** — embed Knowa directly in your app. You own the LLM call; Knowa\nhandles indexing, retrieval, and context formatting. Works with Anthropic, OpenAI, Gemini,\nlocal models, LangChain, LlamaIndex, or any other framework.\n\n**As a REST API** — run the FastAPI server and hit `/query`\n\nto get complete answers with\ncitations from any language or tool, no Python required.\n\n**As a CLI tool** — use the `knowa`\n\ncommand to index directories and chat with your knowledge\nbase directly from the terminal, no server or code required.\n\n**Up to 90–99% token reduction at scale**— three-path hybrid retrieval (vector search, full-text search, property graph) surgically extracts only what is relevant; every query reports measured token savings so you can track efficiency in production. See[Understanding token savings](#understanding-token-savings)for realistic expectations at different corpus sizes.**Hierarchical chunking**— documents are split into small child chunks for precise vector search, then expanded to larger parent chunks for full context — maximising relevance without losing surrounding information**Pluggable embedders**—`OpenAIEmbedder`\n\n(default) or`SentenceTransformerEmbedder`\n\n(fully local, no API key needed at query time); implement the`Embedder`\n\nprotocol to add your own**Multiple sources**— Notion, Confluence Cloud, and local directories (`.md`\n\n,`.txt`\n\n,`.pdf`\n\n,`.docx`\n\n)**Incremental sync**— only re-processes files changed since the last run** Zero LLM calls at index time by default**— spaCy handles entity extraction; no OpenAI spend during indexing regardless of knowledge base size. For richer entity coverage, an optional LLM enrichment pass can be enabled (see[Graph entity extraction](#graph-entity-extraction))**Admin UI**— per-source sync/rebuild controls, query interface, token savings tracking, and interactive entity graph visualization** CLI**— full index and chat management without running the server\n\nThe token savings figure shown after each query measures how much of your indexed corpus Knowa avoided sending to the LLM:\n\n```\nsavings % = 1 − (tokens retrieved by this query / all parent-chunk tokens in the DB)\n```\n\nThis is a **corpus-size-relative metric**. With a small corpus, retrieval may cover most of\nit on every query — and savings will be near 0%. That is expected and correct; it is not a\nbug or a misconfiguration.\n\nBy default Knowa retrieves the top 5 child chunks by vector similarity (`TOP_K_CHUNKS=5`\n\n),\nthen expands each to its parent chunk (~2,048 tokens). That means roughly **10,000 tokens** of\ncontext per query, regardless of corpus size.\n\nSavings only accumulate once your total indexed content significantly exceeds that window.\n\nThe estimates below assume:\n\n**Default settings**—`TOP_K_CHUNKS=5`\n\n, parent chunk size ~2,048 tokens**Typical document sizes**— wiki pages, Notion pages, Confluence pages: 500–3,000 words → 1–4 parent chunks each. Short README-style files (under 500 words) may produce a single chunk; long PDFs or DOCX files (5,000+ words) may produce 5–15 chunks each.**Mixed file types** shift the breakpoints: a corpus of 10 long PDFs can behave like 50+ short markdown files in terms of total chunk count.\n\n| Corpus | Approx. parent chunks | Expected savings |\n|---|---|---|\n| 5 short docs | 5–10 | 0–10% — retrieval covers most of the corpus |\n| 20–30 substantial pages | 30–50 | 40–60% |\n| 100+ pages | 100–200 | 70–85% |\n| 500+ pages | 500+ | 85–95% |\n| Large wiki (1,000+ pages) | 1,000+ | 90–99% |\n\nA 0% savings reading does **not** mean retrieval is broken or that Knowa is not helping.\nIt means your corpus is small enough that the retrieval window covers essentially all of it\n— which is the correct behaviour. The answer quality benefit (finding the right pages and\nsynthesising a coherent answer) is present at any corpus size.\n\nThe savings badge becomes a useful production monitoring signal once your knowledge base grows large enough that retrieval is genuinely selective — typically 20–30 substantial documents or more.\n\nIf you have a large corpus but still see low savings, consider reducing `TOP_K_CHUNKS`\n\nin\nyour `.env`\n\n. The default of `5`\n\nis conservative; dropping to `3`\n\nroughly halves retrieved\ncontext and increases savings, at the cost of slightly lower recall on broad questions.\n\nThe CLI connects directly to the database — no server needed.\n\n```\n# Index a directory — incremental by default, full rebuild on first run\nknowa index /path/to/docs\nknowa index /path/to/docs --name \"Engineering Docs\"   # attach a friendly label\nknowa index /path/to/docs --full                       # force full rebuild\nknowa index /path/to/docs --workers 4                  # parallel indexing (4 threads)\n\n# See all indexed sources with page/chunk counts and labels\nknowa list\n\n# Chat with the index\nknowa chat                                    # interactive REPL\nknowa chat \"What is our refund policy?\"       # single-shot\nknowa chat --source \"Engineering Docs\"        # scoped to one source\nknowa chat --debug                            # show retrieval path before each answer\nknowa chat --no-index /path/to/docs           # bypass index, read docs directly\n\n# Inspect graph backend (entity/edge counts)\nknowa debug\n\n# Benchmark questions with and without index\nknowa bench questions.json\n\n# Clear index for a source\nknowa clear /path/to/docs\nknowa clear --source-id <notion-workspace-id>\nknowa clear --source-id company.atlassian.net/ENG\nknowa clear --all\n```\n\nSupported file types: `.md`\n\n, `.txt`\n\n, `.pdf`\n\n, `.docx`\n\nTo quickly populate the knowledge base for testing the graph visualization:\n\n```\npython scripts/fetch_sample_docs.py --out /tmp/knowa_sample_docs\nknowa index /tmp/knowa_sample_docs --name \"Knowa Sample Docs\"\n```\n\nThis downloads ~20 Wikipedia articles (AI companies and researchers) and indexes them.\n\n``` python\nfrom knowa import KnowledgeBase\n\n# Create once at app startup — reuse across requests\nkb = KnowledgeBase()\nkb.index(\"/path/to/docs\", label=\"Engineering Docs\")\n\n# Get formatted context ready to inject into any LLM\ncontext = kb.get_context(\"What is our deployment process?\")\npython\nimport anthropic\nfrom knowa import KnowledgeBase\n\nkb = KnowledgeBase()\nclient = anthropic.Anthropic()\n\ndef answer(question: str) -> str:\n    context = kb.get_context(question)\n    message = client.messages.create(\n        model=\"claude-opus-4-7\",\n        max_tokens=1024,\n        system=f\"Answer using only this context:\\n\\n{context}\",\n        messages=[{\"role\": \"user\", \"content\": question}],\n    )\n    return message.content[0].text\n```\n\n`retrieve()`\n\ngives you structured results before synthesis — filter, rerank, or log them:\n\n``` python\nfrom knowa import KnowledgeBase, RetrievedChunk\n\nkb = KnowledgeBase()\nchunks: list[RetrievedChunk] = kb.retrieve(\"What is our SLA?\")\n\nfor c in chunks:\n    print(f\"  [{c.retrieval_type}] score={c.score:.3f} | {c.page_title}\")\n\n# Filter by confidence, then format\nfiltered = [c for c in chunks if c.score >= 0.3 or c.retrieval_type == \"fts\"]\ncontext = kb.format_context(filtered)\npip install \"knowa[st]\"\npython\nfrom knowa import KnowledgeBase\nfrom knowa.embedders.sentence_transformers import SentenceTransformerEmbedder\n\n# Set EMBEDDING_DIMENSIONS=384 in .env before the first index run\nembedder = SentenceTransformerEmbedder(\"all-MiniLM-L6-v2\")\nkb = KnowledgeBase(embedder=embedder)\nkb.index(\"/path/to/docs\", full=True)\ncontext = kb.get_context(\"What is our refund policy?\")  # no OpenAI calls\nkb.index(\"/docs/engineering\", label=\"Engineering\")\nkb.index(\"/docs/legal\", label=\"Legal\")\n\neng_context = kb.get_context(\"deployment process\", source_id=\"Engineering\")\nlegal_context = kb.get_context(\"compliance requirements\", source_id=\"Legal\")\n```\n\nManage history in your application and pass only the retrieved context to each turn:\n\n``` python\nfrom knowa import KnowledgeBase\n\nkb = KnowledgeBase()\nhistory: list[dict] = []\n\ndef chat(user_message: str) -> str:\n    context = kb.get_context(user_message)\n    messages = history[-10:] + [{\"role\": \"user\", \"content\": user_message}]\n    response = your_llm(system=f\"Answer using this context:\\n\\n{context}\", messages=messages)\n    history.append({\"role\": \"user\", \"content\": user_message})\n    history.append({\"role\": \"assistant\", \"content\": response.text})\n    return response.text\n```\n\nRetrieval is synchronous. Wrap with `asyncio.to_thread`\n\nto avoid blocking the event loop:\n\n``` python\nimport asyncio\nfrom knowa import KnowledgeBase\n\nkb = KnowledgeBase()\n\nasync def handle_request(question: str, source: str | None = None) -> str:\n    context = await asyncio.to_thread(kb.get_context, question, source)\n    async with your_async_llm_client() as client:\n        return await client.complete(context, question)\npython\nfrom functools import lru_cache\nfrom fastapi import Depends, FastAPI\nfrom knowa import KnowledgeBase\n\napp = FastAPI()\n\n@lru_cache(maxsize=1)\ndef get_kb() -> KnowledgeBase:\n    return KnowledgeBase()\n\n@app.post(\"/ask\")\nasync def ask(question: str, source: str | None = None, kb: KnowledgeBase = Depends(get_kb)):\n    import asyncio\n    chunks = await asyncio.to_thread(kb.retrieve, question, source)\n    context = kb.format_context(chunks)\n    answer = await your_llm(context, question)\n    return {\"answer\": answer, \"citations\": [{\"title\": c.page_title, \"url\": c.url} for c in chunks if c.page_title]}\n```\n\n`KnowledgeBase(database_url=None, embedder=None)`\n\n— `database_url`\n\nfalls back to `DATABASE_URL`\n\nenv var; `embedder`\n\ndefaults to `OpenAIEmbedder`\n\n. **The embedder must match what was used when the index was built.**\n\n| Method | Returns | Description |\n|---|---|---|\n`index(path, label=None, full=False)` |\n`dict` |\nIndex a local directory. Returns `{\"indexed\": N, \"deleted\": N, \"errors\": N}` . |\n`retrieve(question, source_id=None)` |\n`list[RetrievedChunk]` |\nHybrid retrieval — no LLM calls. |\n`format_context(chunks)` |\n`str` |\nFormat chunks into `[Title]\\n---` blocks for LLM injection. |\n`get_context(question, source_id=None)` |\n`str` |\n`retrieve()` + `format_context()` + graph relationships in one call. |\n\n`RetrievedChunk`\n\nfields: `content`\n\n, `score`\n\n, `page_id`\n\n, `page_title`\n\n, `source_id`\n\n, `source_type`\n\n, `url`\n\n, `retrieval_type`\n\n(`\"vector\"`\n\nor `\"fts\"`\n\n).\n\n| Concern | Guidance |\n|---|---|\nEmbedder reuse |\nCreate one `KnowledgeBase` per process and reuse it. Re-creating per request is wasteful. |\nDimension consistency |\nSet `EMBEDDING_DIMENSIONS` before the first `index()` call. Changing the embedder later requires a full rebuild — mixing dimensions returns wrong results silently. |\nThread safety |\n`KnowledgeBase` is safe to call from multiple threads — the `psycopg2` pool is thread-safe. |\nAsync blocking |\nUse `asyncio.to_thread(kb.retrieve, ...)` in async frameworks. |\nContext size |\n`get_context()` returns all retrieved chunks untruncated. For small context windows, filter by `score` and pass `format_context(filtered)` . |\nRe-indexing |\nCall `kb.index(path)` on a schedule for directories that change. Each call runs incrementally. |\nError handling |\n`index()` returns `{\"errors\": N}` . Check this value and alert if non-zero. |\n\n```\ncurl -X POST \"http://localhost:8000/query\" \\\n  -H \"X-API-Key: <key>\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"question\": \"What is our refund policy?\", \"response_mode\": \"with_citations\"}'\n```\n\nResponse modes: `answer_only`\n\n· `with_citations`\n\n· `full`\n\n(includes raw retrieved chunks)\n\nKnowa builds a property graph from your documents by extracting named entities during indexing. Two mechanisms are available and can be used together.\n\nRuns entirely locally, zero API cost, no latency added. Recognises standard entity types: people, organisations, locations, dates, and 14 others from the OntoNotes scheme.\n\n```\nSPACY_MODEL=en_core_web_sm    # default — fast, lightweight\nSPACY_MODEL=en_core_web_trf   # transformer-based, better accuracy on ambiguous names\nSPACY_MODEL=en_core_sci_lg    # scientific text (pip install scispacy + model)\nSPACY_MODEL=en_ner_bc5cdr_md  # biomedical (pip install scispacy + model)\n```\n\nAfter changing the model, install it:\n\n```\npython -m spacy download en_core_web_trf\n# or for scispaCy models follow https://allenai.github.io/scispacy/\n```\n\nAn optional second pass that runs after spaCy and adds domain-specific entities spaCy misses — products, technologies, frameworks, abstract concepts, and any entity type relevant to your domain. Runs concurrently across pages (configurable parallelism) so it does not serialize indexing.\n\nSupports any OpenAI-compatible provider. Examples:\n\n```\n# gpt-4o-mini (~$0.30 per 1,000 pages)\nENTITY_LLM_MODEL=gpt-4o-mini\nENTITY_LLM_API_KEY=sk-...           # optional — falls back to OPENAI_API_KEY\n\n# Qwen3 via Dashscope\nENTITY_LLM_MODEL=qwen3-7b\nENTITY_LLM_API_KEY=sk-...\nENTITY_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\n\n# Kimi2 (Moonshot)\nENTITY_LLM_MODEL=moonshot-v1-8k\nENTITY_LLM_API_KEY=sk-...\nENTITY_LLM_BASE_URL=https://api.moonshot.cn/v1\n```\n\nConcurrency is controlled by `ENTITY_LLM_CONCURRENCY`\n\n(default `5`\n\n). Leave `ENTITY_LLM_MODEL`\n\nblank to keep the default zero-LLM-at-index-time behaviour.\n\n**Two-phase indexing:** when you combine `--workers`\n\nwith `ENTITY_LLM_MODEL`\n\n, the two phases run\nsequentially — Phase 1 (OCR, chunking, embedding, spaCy) completes across all pages first, then\nPhase 2 (LLM enrichment) runs concurrently at `ENTITY_LLM_CONCURRENCY`\n\nparallelism. The two\nconcurrency settings are independent: `--workers`\n\ncontrols how many files are read and embedded\nat once; `ENTITY_LLM_CONCURRENCY`\n\ncontrols how many LLM API calls fire in the enrichment pass.\n\n| Domain | Recommended model | Install | What it adds over `en_core_web_sm` |\n|---|---|---|---|\n| General wikis, internal docs, HR, support | `en_core_web_sm` |\nbuilt-in | — baseline |\n| Same domains, higher accuracy | `en_core_web_trf` |\n`python -m spacy download en_core_web_trf` |\nTransformer-based; better on ambiguous names and abbreviations |\n| Scientific papers, research reports | `en_core_sci_lg` |\n`pip install scispacy` + model |\nChemicals, genes, proteins, species, experimental methods |\n| Biomedical / clinical | `en_ner_bc5cdr_md` |\n`pip install scispacy` + model |\nDiseases, drugs, adverse drug reactions (BC5CDR corpus) |\n| Legal (case law, UK) | `en_blackstone_proto` |\n`pip install blackstone` |\nLegal concepts, case references, court names — UK-focused, not actively maintained |\n| Finance, engineering, security | any of the above | — | General models catch orgs, people, dates well; domain-specific terms need LLM enrichment (see below) |\n\nNote:scispaCy models must be downloaded separately after`pip install scispacy`\n\n. See[allenai.github.io/scispacy]for the full model list and download URLs.\n\nspaCy extracts what it was trained on. LLM enrichment fills the gaps — domain-specific terms, internal jargon, and relationship types that no pre-trained NER model knows about.\n\n| Domain | spaCy alone | Add LLM enrichment when… |\n|---|---|---|\nGeneral wikis / internal docs |\nGood — people, orgs, locations covered | You need product names, team names, internal codenames, or project-specific concepts indexed as graph nodes |\nLegal |\nPartial — parties, orgs, dates caught; clause types and regulatory citations missed | Almost always — contract clause types (Indemnification, Force Majeure), party roles (Licensor, Licensee), regulation references (GDPR Art. 17, SOX §404) require LLM extraction |\nFinance |\nPartial — companies, dates, monetary values caught; financial instruments and risk terms missed | When indexing analyst reports, earnings calls, or contracts — extracts instruments (CDO, SPAC), risk categories, regulatory filings (10-K, 8-K) |\nBiomedical / clinical |\nGood with scispaCy — diseases, drugs, genes covered | When you need treatment protocols, trial phases, or mechanism-of-action relationships beyond standard NER types |\nEngineering / code docs |\nWeak — standard NER misses most technical entities | Almost always — APIs, services, error codes, config flags, version numbers, and dependency names are invisible to general NER models |\nSecurity / infosec |\nWeak — CVE IDs and threat actors are not standard NER types | Almost always — CVEs, attack techniques (MITRE ATT&CK), threat actors, vulnerability classes, affected products |\nHR / people ops |\nGood — people and org names covered | When role titles, skill taxonomies, or org-structure concepts matter for graph traversal |\nCustomer support |\nPartial — product names and people caught; issue categories and feature areas missed | When support tickets reference internal product areas, error messages, or workflow steps that aren't in any NER vocabulary |\n\n**Rule of thumb:** if the entities that matter most in your domain are *not* people, organisations,\nlocations, or dates — add LLM enrichment. The cost is low (gpt-4o-mini runs at roughly $0.30\nper 1,000 pages) and the graph coverage improvement is substantial for technical and\ndomain-specific corpora.\n\n- Go to\n[notion.so/my-integrations](https://www.notion.so/my-integrations)→**New integration**→ give it a name → copy the** Internal Integration Token**(starts with`secret_`\n\n) - Set\n`NOTION_API_KEY=secret_...`\n\nin your`.env`\n\n- In Notion, open each root page you want indexed →\n**⋯ → Connections → Add connection → select your integration**. Sub-pages inherit the connection automatically. - Trigger an initial full index:\n\n```\ncurl -X POST \"http://localhost:8000/admin/rebuild?full=true\" -H \"X-API-Key: <key>\"\n```\n\nSub-pages are indexed recursively. Container pages (no body content, only child pages) produce 0 chunks — their children are indexed normally. For a 5,000-page workspace the initial index takes 30–90 minutes depending on Notion API rate limits.\n\n- Create an Atlassian API token at\n[id.atlassian.com/manage-profile/security/api-tokens](https://id.atlassian.com/manage-profile/security/api-tokens) - Find your space key in the Confluence URL:\n`https://yourcompany.atlassian.net/wiki/spaces/**ENG**/pages/...`\n\n- Add to\n`.env`\n\n:\n\n```\nCONFLUENCE_BASE_URL=https://yourcompany.atlassian.net\nCONFLUENCE_USERNAME=you@yourcompany.com\nCONFLUENCE_API_TOKEN=<token from step 1>\nCONFLUENCE_SPACE_KEY=ENG\n```\n\n- Trigger an initial full index:\n\n```\ncurl -X POST \"http://localhost:8000/admin/rebuild?full=true\" -H \"X-API-Key: <key>\"\n```\n\nAll four vars must be set to enable the connector — leaving any blank disables it without error. Large spaces (1,000+ pages) may take 10–30 minutes.\n\nKnowa is a single Docker container + PostgreSQL.\n\n```\ndocker build -t knowa .\ndocker run -d --env-file .env -p 8000:8000 knowa\ndocker logs -f knowa\n```\n\npgvector must be available. All major providers support it:\n\n| Provider | Notes |\n|---|---|\nAWS RDS / Aurora |\nEnable `pgvector` in Parameter Groups, then `CREATE EXTENSION IF NOT EXISTS vector;` |\nGoogle Cloud SQL |\nEnable the `pgvector` database flag, then `CREATE EXTENSION IF NOT EXISTS vector;` |\nAzure Database for PostgreSQL |\nBuilt-in extension — run `CREATE EXTENSION IF NOT EXISTS vector;` |\nSupabase |\npgvector pre-installed; run `CREATE EXTENSION IF NOT EXISTS vector;` from the SQL editor |\nNeon |\npgvector pre-installed; run `CREATE EXTENSION IF NOT EXISTS vector;` from the console |\nRender / Railway |\nCreate a PostgreSQL service; connect via psql and run the extension command |\n\nAfter provisioning, set `DATABASE_URL`\n\nin your environment.\n\nThe app exposes port `8000`\n\nand reads all config from environment variables:\n\n| Provider | Approach |\n|---|---|\nAWS ECS / Fargate |\nPush to ECR, create a Fargate task, inject env vars via Secrets Manager |\nGoogle Cloud Run |\nPush to Artifact Registry, deploy as a Cloud Run service (`--port 8000` ) |\nAzure Container Apps |\nPush to ACR, deploy as a Container App, configure env vars in the portal |\nRender |\nConnect GitHub repo, set Runtime to Docker, set Port to 8000 |\nRailway |\nConnect GitHub repo or push a Docker image, add env vars in the dashboard |\nFly.io |\n`fly launch` auto-detects the Dockerfile; set secrets with `fly secrets set` |\n\n```\nsudo apt update && sudo apt install -y docker.io\ngit clone <your-repo> && cd knowa\ncp .env.example .env && nano .env\ndocker build -t knowa .\ndocker run -d --env-file .env -p 8000:8000 --restart unless-stopped knowa\n```\n\nThree layers — run in order.\n\n```\npip install pytest\npytest tests/unit/ -v\n# Expected: 99 passed, 21 skipped\n\n# Unlock full coverage:\npython -m spacy download en_core_web_sm   # +8 spaCy graph extractor tests\npip install markdownify requests          # +13 Confluence connector tests\n# Full install: 120 passed, 0 skipped\n```\n\nTests the full ingestion pipeline with fixture markdown files. No Notion API or OpenAI calls — embeddings use deterministic fake vectors.\n\n```\ncreatedb knowa_test\npsql knowa_test -c \"CREATE EXTENSION IF NOT EXISTS vector;\"\n\nTEST_DATABASE_URL=postgresql://postgres:postgres@localhost:5432/knowa_test \\\nOPENAI_API_KEY=sk-anything API_KEY=test \\\npytest tests/integration/ -v\n```\n\nTests the full system end-to-end against real Notion pages with deterministic content.\n\n**One-time setup:**\n\n- Create three pages in your Notion workspace with exact content from\n`tests/e2e/notion_test_pages.md`\n\n:**Knowa Test — FAQ****Knowa Test — Pricing****Knowa Test — Handbook**\n\n- Connect your integration to each page (⋯ → Connections)\n- Copy the 32-char hex ID from each page's URL (strip dashes and page title)\n- Create\n`.env.test`\n\n:\n\n```\nTEST_DATABASE_URL=postgresql://postgres:postgres@localhost:5432/knowa_test\nTEST_NOTION_FAQ_PAGE_ID=<32-char id>\nTEST_NOTION_PRICING_PAGE_ID=<32-char id>\nTEST_NOTION_HANDBOOK_PAGE_ID=<32-char id>\n```\n\n**Run:**\n\n```\nexport $(cat .env.test | xargs)\npytest tests/e2e/ -v -s\n```\n\nE2E tests auto-skip if any required env var is missing — they will not break CI runs without Notion credentials. Each full run costs ~$0.05 in OpenAI API calls.\n\nUse after schema changes, embedding model updates, or suspected index corruption:\n\n```\ncurl -X POST \"http://localhost:8000/admin/rebuild?full=true\" -H \"X-API-Key: <key>\"\n# or via CLI (no server needed):\nknowa index /path/to/docs --full\nSELECT source_id, source_type, label, last_synced_at FROM sync_state;\nSELECT\n  relname AS table,\n  pg_size_pretty(pg_total_relation_size(relid)) AS total_size\nFROM pg_catalog.pg_statio_user_tables\nORDER BY pg_total_relation_size(relid) DESC;\nSELECT source_id, COALESCE(label, source_id) AS name,\n       last_synced_at, NOW() - last_synced_at AS age\nFROM sync_state\nORDER BY last_synced_at ASC;\n```\n\nAt least one cloud connector must be fully configured: either `NOTION_API_KEY`\n\nor all four `CONFLUENCE_*`\n\nvars. This error does not apply to local directory sources — use `knowa index <dir>`\n\nfor those. Verify env vars are loaded:\n\n```\ndocker exec knowa env | grep -E 'NOTION|CONFLUENCE'\n```\n\nRun `knowa debug`\n\nto check raw node/edge counts. If nodes = 0:\n\n**spaCy model not installed**— run`python -m spacy download en_core_web_sm`\n\nthen trigger a full rebuild**Purely technical content**— code files, configs, and API references have few named entities. Graph works best with prose containing people, organisations, and places. Consider adding[LLM entity enrichment](#llm-entity-enrichment-opt-in)for technical domains.**Indexed before graph extraction was set up**— trigger a full rebuild from the Admin tab to re-run extraction on all pages.\n\n```\nSELECT COUNT(*) FROM chunks WHERE chunk_type = 'child';\n```\n\nIf 0, run a full rebuild. If non-zero, check for OpenAI API errors in the server logs — the query embedding may be failing.\n\nEmbedding requests are batched at 100 texts per call (`knowa/indexing/embedder.py`\n\n). If hitting rate limits, reduce the batch size to 50 or request a rate limit increase from OpenAI.\n\nThe Notion integration must be explicitly connected to each page: **⋯ → Connections → Add connection**. Sub-pages inherit the connection automatically, but top-level pages do not.\n\n```\nCREATE EXTENSION IF NOT EXISTS vector;\n```\n\nOn AWS RDS, ensure `pgvector`\n\nis in your Parameter Group's `shared_preload_libraries`\n\nbefore creating the extension.\n\nThe IVFFlat index needs at least `lists × 3`\n\nrows to be useful. With `lists=100`\n\nthat is ~300 child chunks. This warning appears on small initial datasets and resolves automatically as the index grows.\n\nNo built-in metrics endpoint yet. Recommended additions for production:\n\n**Structured logging**— replace`logging.basicConfig`\n\nwith`structlog`\n\nfor JSON logs**Rebuild alerts**— if`stats[\"errors\"] > 0`\n\nafter a rebuild, send a Slack/email notification**Query latency**— add a FastAPI middleware timer and log P95 latency** Index freshness**— alert if`last_synced_at`\n\nis older than your sync interval (the Operations query above makes a good health check)\n\nFor full architecture documentation, schema, API spec, cost model, and extension points see [DESIGN.md](/zzorphcreator/knowa/blob/master/DESIGN.md).\n\nPostgreSQL + pgvector · FastAPI · OpenAI (embeddings + completions) · spaCy (configurable NER model) · tiktoken · Neo4j / Kuzu (optional graph backends) · any OpenAI-compatible LLM for entity enrichment (gpt-4o-mini, Qwen3, Kimi2, etc.)", "url": "https://wpnews.pro/news/knowa-open-source-llm-context-optimizer", "canonical_source": "https://github.com/zzorphcreator/knowa", "published_at": "2026-05-29 20:07:49+00:00", "updated_at": "2026-05-29 20:16:28.650413+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "ai-products", "large-language-models", "natural-language-processing"], "entities": ["Knowa"], "alternates": {"html": "https://wpnews.pro/news/knowa-open-source-llm-context-optimizer", "markdown": "https://wpnews.pro/news/knowa-open-source-llm-context-optimizer.md", "text": "https://wpnews.pro/news/knowa-open-source-llm-context-optimizer.txt", "jsonld": "https://wpnews.pro/news/knowa-open-source-llm-context-optimizer.jsonld"}}