{"slug": "prismlib-semantic-llm-cache-and-cluster-mesh-that-cuts-token-spend", "title": "PrismLib – semantic LLM cache and cluster mesh that cuts token spend", "summary": "PrismLib, a new open-source Python package, offers a semantic LLM cache, distributed database driver, and cluster intelligence mesh that can reduce token spend by 60-80% and cut DB read latency by 98.6%. The tool runs entirely in-process without external dependencies like Redis or Kubernetes, providing multi-tenant isolation and automatic failover for AI applications.", "body_md": "**Tensor-native LLM cache, distributed DB driver, and cluster intelligence — one package.**\n\nPrismLib has three layers. Use any combination:\n\n| Layer | What it solves | Key number | Install |\n|---|---|---|---|\nPrismCache |\nLLM API cost — semantic cache catches repeated & paraphrased queries in-process | 91–96% hit rate |\n`pip install \"prismlib[cache]\"` |\nPrismDriver |\nDB read latency — WAL-streamed local index replaces network round-trips | 98.6% latency reduction (143ms → 2ms) |\n`pip install \"prismlib[fabric]\"` |\nPrismLib Micro |\nCluster token cost + HA — shares answers across containers, auto-failover, health mesh | 76% fewer tokens cluster-wide |\nincluded in `prismlib[fabric]` |\n\nAll three run entirely in-process. No Redis. No Pinecone. No Prometheus. No Kubernetes operator.\n\nWraps any LLM call. Paraphrased queries return the cached answer without touching the API.\nMulti-tenant math: JL projection seeded by `SHA-256(tenant_id)`\n\ngives each tenant a mathematically\nisolated address space — not a query filter, a projection matrix.\n\nTwo components on two machines:\n\n**Server Wrapper**(DB node) — intercepts WAL/binlog, vectorizes rows, streams encrypted float32 frames via CHORUS Fabric** DLL Driver**(app node) — subscribes to the stream, keeps a local PrismResonance index warm; reads never leave the process\n\nBuilt into `prismlib[fabric]`\n\n, zero extra install:\n\n**ClusterCache**— once any node answers a query, every peer caches it via CHORUS TOKEN_SYNC frames. BLUE and ORANGE nodes billed 0 tokens on warm queries.**AlertManager**— 12 default health rules; fires SIGNAL frame + admin email in <1s when CPU/RAM/disk thresholds are crossed. No scrape interval. No Datadog agent.**Blue/Green/Orange failover**— GREEN is active master, BLUE is warm standby (auto-promotes in ~3s if GREEN goes silent), ORANGE is syncing reserve.** ContextCompressor**— cosine-sim top-K chunk selection before every LLM call. 58–64% context token reduction, zero extra cost.\n\nBuilt on two open-source InsightIts libraries:\n\n— wave-memory similarity engine powering every cache lookup and local vector index[PrismResonance](https://github.com/insightitsGit/prismresonance)— encrypted gRPC binary streaming protocol carrying float32 tensor frames between nodes[CHORUS Fabric](https://github.com/insightitsGit/chorus_fabric)\n\n```\n# Semantic LLM cache only\npip install \"prismlib[cache]\"\n\n# With OpenAI embeddings\npip install \"prismlib[cache,cache-openai]\"\n\n# With Anthropic/Voyage embeddings\npip install \"prismlib[cache,cache-anthropic]\"\n\n# With Ollama (local models)\npip install \"prismlib[cache,cache-ollama]\"\n\n# DB driver (app node)\npip install \"prismlib[fabric]\"\n\n# Server Wrapper daemon (DB node — Linux/macOS)\npip install \"prismlib[wrapper]\"\nprism-wrapper --config /etc/prism/wrapper.toml\n\n# Everything\npip install \"prismlib[all]\"\n```\n\nSave 60-80% of LLM API calls by serving semantically identical queries from cache. Paraphrases hit the cache — \"How do I reset my password?\" and \"I forgot my password, help\" return the same answer without a second LLM call.\n\n``` python\nfrom prism.cache import PrismCache\n\ncache = PrismCache.build(tenant_id=\"my-app\", llm_model=\"gpt-4o\")\n\ndef ask(question: str) -> str:\n    return cache.get_or_call(\n        query=question,\n        call_fn=lambda: openai_client.chat.completions.create(\n            model=\"gpt-4o\",\n            messages=[{\"role\": \"user\", \"content\": question}],\n        ).choices[0].message.content,\n    )\n```\n\nEach tenant gets a mathematically isolated cache space (JL projection seeded by tenant ID). One customer's cached answers never bleed into another's.\n\n``` php\nfrom prism.cache import PrismCache\n\ndef get_cache(tenant_id: str) -> PrismCache:\n    return PrismCache.build(tenant_id=tenant_id, llm_model=\"gpt-4o-mini\")\n\n# Tenant A and tenant B share no cache state\ncache_a = get_cache(\"acme-corp\")\ncache_b = get_cache(\"globex-inc\")\n\nanswer = cache_a.get_or_call(query=\"What is my plan limit?\", call_fn=llm_call)\n```\n\nWrap your existing LLM endpoint without changing any business logic.\n\n``` python\n# FastAPI\nfrom fastapi import FastAPI, Request\nfrom prism.cache import PrismCache\n\napp = FastAPI()\ncache = PrismCache.build(tenant_id=\"api\", llm_model=\"gpt-4o\")\n\n@app.post(\"/chat\")\nasync def chat(request: Request):\n    body = await request.json()\n    question = body[\"message\"]\n    answer = await cache.aget_or_call(\n        query=question,\n        call_fn=lambda: llm_client.ask(question),\n    )\n    return {\"answer\": answer}\n# Django — add to MIDDLEWARE in settings.py\n# prism/middleware.py\nfrom prism.cache import PrismCache\n\n_cache = PrismCache.build(tenant_id=\"django-app\", llm_model=\"gpt-4o\")\n\nclass PrismCacheMiddleware:\n    def __init__(self, get_response):\n        self.get_response = get_response\n\n    def __call__(self, request):\n        return self.get_response(request)\n\n    def process_llm_query(self, question: str, call_fn) -> str:\n        return _cache.get_or_call(query=question, call_fn=call_fn)\npython\nimport asyncio\nfrom prism.cache import PrismCache\n\ncache = PrismCache.build(tenant_id=\"batch\", llm_model=\"gpt-4o-mini\")\n\nasync def process_batch(questions: list[str]) -> list[str]:\n    tasks = [\n        cache.aget_or_call(query=q, call_fn=lambda q=q: llm_call(q))\n        for q in questions\n    ]\n    return await asyncio.gather(*tasks)\npython\nfrom prism.cache import PrismCache\n\ncache = PrismCache.build(tenant_id=\"finance\", llm_model=\"gpt-4o\")\n\n# After processing queries...\nmetrics = cache.metrics()\nprint(f\"Hit rate:          {metrics.hit_rate:.0%}\")\nprint(f\"Tokens saved:      {metrics.tokens_saved:,}\")\nprint(f\"Cost saved today:  ${metrics.cost_saved_usd:.2f}\")\nprint(f\"Projected monthly: ${metrics.cost_saved_usd * 30:.0f}\")\n```\n\nPrismDriver has two components that work together. Install each on the right machine.\n\n**On the DB node — Server Wrapper**\n\nThe Server Wrapper is an OS daemon that sits next to your database. It reads WAL/binlog changes, vectorizes rows using `RowVectorizer`\n\n, encrypts them with `TensorCipher`\n\n(via CHORUS Fabric), and streams float32 frames to every connected DLL Driver.\n\n```\n# Install on the DB node (Linux or macOS)\npip install \"prismlib[wrapper]\"\n\n# Configure and start\nprism-wrapper --config /etc/prism/wrapper.toml\n# /etc/prism/wrapper.toml\n[database]\nflavor = \"postgresql\"\ndsn = \"postgresql://user:pass@localhost/mydb\"\n\n[chorus]\nlisten_port = 50051\ntenant_id = \"products-service\"\n```\n\nSupported databases: PostgreSQL (WAL / wal2json), MySQL (binlog), CockroachDB (EXPERIMENTAL CHANGEFEED), TiDB (push model).\n\n**On the app node — DLL Driver**\n\nThe DLL Driver is an in-process library that replaces your DB connection string. On startup it connects to the Server Wrapper, subscribes to the CHORUS Fabric stream, and keeps a local PrismResonance index warm. All reads hit the in-process index — no network round-trip, sub-millisecond latency.\n\n```\n# Install on the app node\npip install \"prismlib[fabric]\"\npython\n# Before\nimport psycopg2\nconn = psycopg2.connect(\"postgresql://user:secret@db-host:5432/mydb\")\n\n# After — no password, no hostname in app config\nfrom prism.ffi import PrismDriver, DriverConfig\n\nasync with PrismDriver(DriverConfig(wrapper_host=\"db-proxy-1\")) as driver:\n    results = await driver.query(\n        embedding=my_embedding_vector,\n        top_k=5,\n        threshold=0.85,\n    )\n```\n\nThe driver keeps a local PrismResonance cache warm via a background WAL subscription. Reads never touch the DB — they hit the in-process float32 index.\n\n``` python\nfrom prism.ffi import PrismDriver, DriverConfig\nimport numpy as np\n\nconfig = DriverConfig(\n    wrapper_host=\"10.0.1.50\",\n    wrapper_port=50051,\n    tenant_id=\"products-service\",\n)\n\nasync with PrismDriver(config) as driver:\n    # Typical hit: < 1ms, no network round-trip\n    query_vec = np.array([...], dtype=np.float32)\n    matches = await driver.query(embedding=query_vec, top_k=10)\n    for m in matches:\n        print(f\"{m.row_id}  score={m.score:.3f}  {m.text_repr}\")\nasync with PrismDriver(config) as driver:\n    ack = await driver.write(\n        row_id=\"product-42\",\n        data={\"name\": \"Widget Pro\", \"price\": 29.99, \"stock\": 150},\n    )\n    print(f\"Written: event_id={ack.event_id}\")\npython\n// Go\nimport prism \"github.com/insightitsGit/prismlib/go\"\n\ndriver, _ := prism.Connect(\"db-proxy-1:50051\", \"my-tenant\")\ndefer driver.Close()\nresults, _ := driver.Query(embedding, prism.QueryOpts{TopK: 5, Threshold: 0.85})\njs\n// C#\nusing InsightIts.Prism;\n\nawait using var driver = new PrismDriver(\"db-proxy-1:50051\", tenantId: \"my-tenant\");\nawait driver.ConnectAsync();\nvar results = await driver.QueryAsync(embedding, topK: 5, threshold: 0.85f);\nphp\n// PHP 8.0+\n$driver = new PrismDriver('db-proxy-1', 50051, 'my-tenant');\n$driver->connect();\n$results = $driver->query($embedding, topK: 5, threshold: 0.85);\n┌─ DB Node ──────────────────────────────────────────────────────┐\n│  PostgreSQL / MySQL / CockroachDB / TiDB                       │\n│       │ WAL / binlog / changefeed                              │\n│  ┌────▼───────────────────────────────────────────────────┐    │\n│  │  prism-wrapper  (pip install \"prismlib[wrapper]\")      │    │\n│  │  RowVectorizer → TensorCipher (V_enc = V @ K)         │    │\n│  │  → HMAC-SHA256 watermark → CHORUSPublisher            │    │\n│  └────────────────────────┬───────────────────────────────┘    │\n└───────────────────────────┼────────────────────────────────────┘\n                            │  CHORUS Fabric (gRPC, encrypted float32)\n┌─ App Node — GREEN ────────┼────────────────────────────────────┐\n│  ┌────────────────────────▼──────────────────────────────┐     │\n│  │  PrismDriver DLL  (pip install \"prismlib[fabric]\")    │     │\n│  │  Subscribe loop → decrypt → PrismResonance index      │     │\n│  └──────────────────────────┬────────────────────────────┘     │\n│                             │ sub-ms query                     │\n│  ┌──────────────────────────▼────────────────────────────┐     │\n│  │  Your Application                                      │     │\n│  │  ┌─────────────────┐   ┌──────────────────────────┐   │     │\n│  │  │  PrismCache     │   │  PrismDriver             │   │     │\n│  │  │  LLM cache      │   │  local PrismResonance    │   │     │\n│  │  │  [cache]        │   │  (no DB round-trip)      │   │     │\n│  │  └─────────────────┘   └──────────────────────────┘   │     │\n│  │  ┌──────────────────────────────────────────────────┐  │     │\n│  │  │  ClusterCache  ← TOKEN_SYNC frames               │  │     │\n│  │  │  AlertManager  ← HEALTH / SIGNAL frames          │  │     │\n│  │  └──────────────────────────────────────────────────┘  │     │\n│  └────────────────────────────────────────────────────────┘     │\n└──────────────────────────────┬─────────────────────────────────┘\n                               │  CHORUS mesh\n          ┌────────────────────┴────────────────────┐\n          │  TOKEN_SYNC · HEALTH · SIGNAL · CONFIG   │\n          ▼                                          ▼\n┌─ App Node — BLUE ──────┐           ┌─ App Node — ORANGE ─────┐\n│  ClusterCache          │           │  ClusterCache            │\n│  (warm standby)        │           │  (syncing reserve)       │\n│  auto-promotes if      │           │  separate network        │\n│  GREEN silent >3s      │           │                          │\n└────────────────────────┘           └──────────────────────────┘\n```\n\nLive results from Azure Container App (`westus2`\n\n, 1 vCPU / 2 GiB, mock LLM baseline):\n\n| Scenario | Users | Duration | Hit rate | Queries | Tokens saved | Monthly est. |\n|---|---|---|---|---|---|---|\n| Light | 20 | 60s | 91.0% |\n5,936 | 1,374,464 | $594 |\n| Mixed | 50 | 300s | 95.9% |\n6,973 | 1,673,216 | $723 |\n\nNumbers use a mock LLM (80ms sleep). With real GPT-4o calls (1–3s), latency speedup is 4–13×; token savings are identical.\n\nLive two-node benchmark (Azure Container Apps `westus2`\n\n, 30 users × 60s per phase):\n\n| Phase | Path | Avg latency | Queries |\n|---|---|---|---|\nBaseline (no driver) |\nApp → DB node, network | 142.8 ms |\n3,864 |\nDriver (local index) |\nApp → in-process PrismResonance | 2.0 ms |\n1,479 |\n\n**70.7× faster · 98.6% latency reduction**\n\nThe 98.6% reduction is a direct result of CHORUS Fabric doing its job. The subscription loop streamed 11,000 rows at **26,000 rows/s** from the DB node into the local PrismResonance index before the load test began. By the time the first `/driver/query`\n\nhit arrived, there were zero network hops — the answer was already in-process. This is what CHORUS Fabric was designed for: getting tensor data to where the query is, before the query arrives.\n\n```\n# Two-node benchmark (requires both container apps running)\npython benchmark/load/run_driver_benchmark.py \\\n  --app-url https://prism-benchmark.nicestone-720c6a9b.westus2.azurecontainerapps.io \\\n  --db-url  https://prism-wrapper-sim.nicestone-720c6a9b.westus2.azurecontainerapps.io \\\n  --users 30 --duration 60\n\n# PrismCache load test\npython benchmark/load/run_benchmark.py \\\n  --host https://prism-benchmark.nicestone-720c6a9b.westus2.azurecontainerapps.io \\\n  --scenario mixed\n```\n\nSee [ benchmark/](/insightitsGit/prismlib/blob/master/benchmark) for full results JSON, Locust CSV files, and the Azure deploy script.\n\nPrismLib is built on two InsightIts open-source libraries. You can use them directly if you need lower-level access.\n\n·[github.com/insightitsGit/prismresonance]`pip install prismresonance`\n\nThe wave-memory similarity engine. Every cache lookup and local vector index in PrismLib goes through PrismResonance.\n\nHow it works:\n\n- Receives a float32 embedding vector\n- Johnson-Lindenstrauss reduces it to 64 dimensions using a projection matrix seeded by\n`SHA-256(tenant_id)`\n\n— this is what gives each tenant mathematically isolated address space - Computes similarity as wave interference (cosine in projected space) in three lock-free phases: snapshot → ONNX MatMul → rank\n- Returns ranked candidates in sub-millisecond time entirely in-process\n\nPrismCache wraps this for LLM response caching. PrismDriver's local replica is a PrismResonance index kept warm by WAL streaming.\n\n``` python\nfrom prismresonance import PrismProjector, WaveIndex\n\nprojector = PrismProjector(dim=64, tenant_id=\"my-tenant\")\nindex = WaveIndex(projector)\n\nindex.add(vector=my_embedding, payload={\"row_id\": \"product-1\", \"text\": \"Widget\"})\nresults = index.query(vector=query_embedding, top_k=5, threshold=0.85)\n```\n\n·[github.com/insightitsGit/chorus_fabric]`pip install chorus-fabric`\n\nThe secure gRPC binary streaming protocol for machine-to-machine tensor communication. PrismDriver uses CHORUS Fabric as its transport layer between the server wrapper on the DB node and the DLL driver on the app node.\n\nHow it works:\n\n`prism-wrapper`\n\n(DB node) vectorizes WAL row events via`RowVectorizer`\n\n, encrypts them with`TensorCipher`\n\n(`V_enc = V @ K`\n\n), appends an HMAC-SHA256 watermark, and publishes batches of raw float32 frames`PrismDriver`\n\n(app node) opens a persistent`WrapperService.Subscribe()`\n\ngRPC stream, receives encrypted frames, decrypts, and feeds them into the local PrismResonance index- Transport is pure binary float32 over gRPC server-streaming — no JSON serialization, no REST overhead\n- The\n`WrapperService`\n\nproto also exposes`Query`\n\n,`Write`\n\n,`Health`\n\n, and`Hello`\n\nRPCs for direct interaction\n\n``` python\nfrom chorus_fabric import CHORUSPublisher, DriverEndpoint\n\npublisher = CHORUSPublisher(config)\npublisher.add_driver(DriverEndpoint(host=\"10.0.1.50\", port=50051, tenant_id=\"prod\"))\nawait publisher.run(event_queue)  # streams WAL events to all connected drivers\n```\n\nCHORUS Fabric is the same protocol used in the CHORUS M2M system — InsightIts' 4-container gRPC topology for tensor communication between AI agents. The 98.6% latency reduction in the PrismDriver benchmark is direct proof that the protocol works at production scale: 11,000 rows streamed at 26,000 rows/s across Azure inter-container networking, then served locally at 2ms.\n\nPrismLib Micro is the cluster layer built into `prismlib[fabric]`\n\n. It adds three\ncapabilities on top of the single-node stack — no extra install, no extra infra.\n\n| Component | What it does |\n|---|---|\nClusterCache |\nShares LLM answers across all nodes via CHORUS TOKEN_SYNC frames. Once any node answers a query, every other node serves it for 0 tokens. |\nAlertManager |\nBroadcasts health alerts as SIGNAL frames + admin email the moment CPU/RAM/disk/latency thresholds are crossed. No Prometheus. No Datadog. |\nBlue/Green/Orange failover |\nThree-tier hot-standby: GREEN (active), BLUE (warm standby, auto-promotes in ~3s), ORANGE (syncing reserve). No Raft dependency. No K8s operator. |\nContextCompressor |\nRanks RAG context chunks by cosine similarity, keeps top-K. Saves 58–64% of context tokens before every LLM call. In-process, no extra model. |\n\n| Metric | Result |\n|---|---|\n| Token savings — cluster avg | 76.1% |\n| BLUE node (cluster cache hit) | 100% — 0 LLM calls |\n| ORANGE node (cross-network cache hit) | 100% — 0 LLM calls |\n| Context compression | 58–64% per query |\n| Health alert propagation | <1 s (709–711 ms measured) |\n| Failover — BLUE promoted to GREEN | ~3–4 s, no human step |\n\nSee [ benchmark/cluster/](/insightitsGit/prismlib/blob/master/benchmark/cluster) for the full benchmark code and\n\n[for raw results.](/insightitsGit/prismlib/blob/master/benchmark/cluster/cluster_benchmark_results.json)\n\n`benchmark/cluster/cluster_benchmark_results.json`\n\n``` python\nfrom prism.cluster.cache import ClusterCache\n\ncache = ClusterCache(node_id=\"node-1\", fabric=chorus_fabric)\n\nanswer = await cache.get_or_call(\n    query          = user_question,\n    query_vector   = embed(user_question),\n    call_fn        = lambda: llm.complete(user_question),\n    context_chunks = retrieved_docs,    # your RAG chunks\n    chunk_vectors  = doc_embeddings,    # their vectors\n)\n```\n\nDrop this in front of your existing `retrieve → generate`\n\nstep. No changes to\nretrieval logic, no changes to your LLM client.\n\n``` python\nfrom prism.cluster.alerts import AlertManager, SMTPConfig\n\nalerts = AlertManager(\n    fabric = chorus_fabric,\n    mail_config = SMTPConfig(\n        host=\"smtp.gmail.com\", port=587,\n        username=\"you@gmail.com\",\n        password=os.getenv(\"GMAIL_APP_PASS\"),\n        recipients=[\"admin@yourcompany.com\"],\n    ),\n)\nawait alerts.evaluate_health(health_snapshot)\n# Fires email + SIGNAL frame to all nodes if any of the 12 default rules trigger\n```\n\n| Capability | PrismLib Micro | Prometheus + Alertmanager | Redis cluster | Raft / etcd |\n|---|---|---|---|---|\n| Cross-node token cache | Yes, built-in |\nNo | Manual (exact match) | No |\n| Alert propagation | <1 s, no infra |\n30–60 s, stack needed | No | No |\n| Auto failover | ~3–4 s, built-in |\nNo | Sentinel, 2–30 s | 150–500 ms |\n| Context compression | 58–64%, free |\nNo | No | No |\n| Extra infrastructure | None |\nPrometheus stack | Redis cluster | etcd cluster |\n\n| Tier | Nodes | Price | Includes |\n|---|---|---|---|\nOpen source |\nUnlimited | Free forever |\nAll cluster code, Apache 2.0 |\nChorusMesh Developer (coming soon) |\nUp to 3 | $29/mo after 30-day trial | ClusterCache + failover + AlertManager |\nChorusMesh Team |\nUp to 10 | $149/mo | + Raft consensus, message broker adapters |\nChorusMesh Business |\nUp to 50 | $499/mo | + multi-region routing, SLA 99.9% |\nEnterprise |\nUnlimited | Contact us | + air-gap, compliance, dedicated Slack |\n\nFor enterprise agreements: [insightits.info@gmail.com](mailto:insightits.info@gmail.com)\n\nPrismLib is open source (Apache 2.0) and free to use. If your team needs any of the following, contact us for enterprise pricing:\n\n**On-premises deployment support**— air-gapped installs, hardened Docker images, SOC 2 documentation** SLA-backed support**— guaranteed response times, incident escalation, dedicated Slack channel** Custom embedding model integration**— fine-tuned domain-specific embedders for higher hit rates in specialized domains (legal, medical, finance, code)** Multi-region CHORUS Fabric topology**— active-active DB node clusters, cross-region WAL fan-out, geo-aware driver routing** Audit logging and compliance exports**— per-query access logs, tenant isolation attestation reports, GDPR data lineage** Professional services**— architecture review, migration from Redis/GPTCache, custom RowVectorizer schemas\n\n**Contact: insightits.info@gmail.com**\n\n**GitHub:**\n\n[github.com/insightitsGit/prismlib](https://github.com/insightitsGit/prismlib)PrismLib is free and will stay free. If it saved your team money on OpenAI bills or database infrastructure, consider sponsoring — it covers benchmark compute, maintenance time, and keeps development moving.\n\n*Your name or logo here — become a sponsor*\n\n**It is one package** — `prismlib`\n\n— published once. The wrapper, driver, and cache are all extras of the same package. Users install what they need:\n\n```\npip install \"prismlib[cache]\"           # PrismCache only\npip install \"prismlib[wrapper]\"         # Server Wrapper (DB node)\npip install \"prismlib[fabric]\"          # DLL Driver (App node)\npip install \"prismlib[all]\"             # Everything\n```\n\n**To publish a new version:**\n\n```\n# 1. Bump version in pyproject.toml (currently 0.4.0)\n# 2. Build the distribution\npip install build twine\npython -m build\n\n# 3. Upload to PyPI (use your token from pypi.org/manage/account/token/)\npython -m twine upload dist/* --username __token__ --password pypi-YOUR_TOKEN\n```\n\nThat's it. One upload covers all three install variants — PyPI resolves the extras automatically.\n\nApache 2.0 — InsightIts © 2026", "url": "https://wpnews.pro/news/prismlib-semantic-llm-cache-and-cluster-mesh-that-cuts-token-spend", "canonical_source": "https://github.com/insightitsGit/prismlib", "published_at": "2026-06-27 00:34:04+00:00", "updated_at": "2026-06-27 01:05:21.096407+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "ai-tools", "developer-tools", "ai-products"], "entities": ["PrismLib", "PrismCache", "PrismDriver", "PrismResonance", "CHORUS Fabric", "OpenAI", "Anthropic", "Ollama"], "alternates": {"html": "https://wpnews.pro/news/prismlib-semantic-llm-cache-and-cluster-mesh-that-cuts-token-spend", "markdown": "https://wpnews.pro/news/prismlib-semantic-llm-cache-and-cluster-mesh-that-cuts-token-spend.md", "text": "https://wpnews.pro/news/prismlib-semantic-llm-cache-and-cluster-mesh-that-cuts-token-spend.txt", "jsonld": "https://wpnews.pro/news/prismlib-semantic-llm-cache-and-cluster-mesh-that-cuts-token-spend.jsonld"}}