# Show HN: Shivvr – Ephemeral semantic embedding and cognitive agent service

> Source: <https://shivvr.nuts.services/>
> Published: 2026-06-16 16:03:20+00:00

Ephemeral semantic embedding & cognitive agent service.

Chunk text. Embed with GTR-T5-base. Search via hybrid FST-BM25 vector rank fusion. Stream turn-based cognitive agent reasoning via native Model Context Protocol (MCP).

Sentence-boundary chunking + GTR-T5-base embeddings (768d). Stores in RwLock<HashMap> — pure ephemeral compute.

RRF blending dense vectors and sparse lexical indices. FST dictionary scanning provides microsecond safe query guardrails and intent-entity score boosting.

Turn-based agent loop with integrated memory search, document ingestion, session indexing, and sandboxed command execution. Powered by OpenAI/Anthropic.

Complete Model Context Protocol HTTP/SSE server endpoint (`/mcp/sse`

) allowing immediate, zero-config integration with Claude Code, Antigravity, and Codex.

Per-agent orthogonal matrix rotation on embeddings. Cosine similarity preserved under encryption. Keys are in-memory only.

`organize`

role uses GTR-T5-base (768d). `retrieve`

role uses OpenAI text-embedding-ada-002 (1536d) — pass your own key or set server-side.

| Method | Endpoint | Description |
|---|---|---|
| GET | `/health` | Status, model info, live counts |
| GET | `/mcp/sse` | MCP Server SSE handshake |
| POST | `/mcp/message` | MCP Server JSON-RPC message router |
| POST | `/sessions/:id/agent/chat` | Stream non-blocking GhostAgent cognitive turns (SSE) |
| POST | `/sessions/:id/ingest` | Chunk + embed text into session |
| GET | `/sessions/:id/search?q=...` | Semantic search (supports RRF `hybrid` & `lexical_only` ) |
| GET | `/sessions/:id` | Session metadata |
| DELETE | `/sessions/:id` | Delete session |
| GET | `/temp` | List temp stores with TTL |
| POST | `/temp/:name/ingest` | Ingest into temp store (2 hr TTL) |
| GET | `/temp/:name/search?q=...` | Search temp store |
| DELETE | `/temp/:name` | Delete temp store |
| POST | `/agent/:id/register` | Register per-agent orthogonal key |
| POST | `/agent/:id/encrypt` | Encrypt embeddings |
| POST | `/agent/:id/decrypt` | Decrypt embeddings |
| POST | `/invert` | Reconstruct text from embedding vector |

```
# Ingest into session
curl -X POST https://shivvr.nuts.services/sessions/my-session/ingest \
  -H "Content-Type: application/json" \
  -d '{"text": "Supreme Raven is protected by Known Opossum.", "source": "vault_specs"}'

# Autonomous Agent Conversational Chat (Streams Thoughts, ToolCalls, & Answer via SSE)
curl -i -X POST http://localhost:8085/sessions/my-session/agent/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Who protects the Supreme Raven?"}'

# Vector-Lexical Hybrid RRF Search
curl "http://localhost:8085/sessions/my-session/search?q=Known+Opossum&hybrid=true"

# High-speed Lexical-Only BM25 Search (Bypasses ONNX embedder)
curl "http://localhost:8085/sessions/my-session/search?q=Opossum&lexical_only=true"

# Synchronize Claude Code or Antigravity with shivvr's Native MCP Server
nemesis8 mcp add http://localhost:8085/mcp/sse
```

| Param | Default | Description |
|---|---|---|
`q` | required | Query text |
`n` | 5 | Number of results |
`hybrid` | false | Blend semantic vectors + BM25 scores (Reciprocal Rank Fusion) |
`lexical_only` | false | Bypass vector embedder, execute pure BM25 search |
`guardrail` | true | Enable FST toxic term scanning and automatic query blocking |
`role` | organize | `organize` (768d local) or `retrieve` (1536d OpenAI) |
`time_weight` | 0.0 | Blend semantic + recency score (0–1) |
`decay_halflife_hours` | 168 | Recency decay half-life in hours |
`include_nearby` | false | Return temporally adjacent chunks |
`agent_id` | — | Agent ID for encrypted search |
`openai_api_key` | — | Per-request OpenAI key for `retrieve` role (overrides server key) |

| Variable | Default | Description |
|---|---|---|
`PORT` | 8080 | Listen port |
`MODEL_PATH` | models/gtr-t5-base.onnx | GTR-T5-base ONNX embedder |
`TOKENIZER_PATH` | models/tokenizer.json | Tokenizer |
`OPENAI_API_KEY` | — | Enables OpenAI completions and retrieve embeddings |
`ANTHROPIC_API_KEY` | — | Enables Anthropic completions and GhostAgent loops |
`NUTS_AUTH_JWKS_URL` | — | Enable auth (open dev mode if unset) |
`NUTS_AUTH_VALIDATE_URL` | https://auth.nuts.services/api/validate | API token validation endpoint |

| Layer | Choice |
|---|---|
| Runtime | Rust + Tokio + axum |
| Cognition | GhostAgent cognitive RAG turn loop (OpenAI / Anthropic compat) |
| MCP Server | HTTP/SSE JSON-RPC 2.0 Model Context Protocol transport layer |
| Hybrid Index | Tantivy FST deterministic phrase engine + BM25F field indexer |
| Embedding | GTR-T5-base (768d) via ONNX Runtime 2.0 — local, required |
| Storage | Ephemeral RwLock<HashMap> — no disk, no volume mounts |
| GPU | CUDA 12.6 via ort EP on Cloud Run L4 — CPU fallback automatic |
| Inversion | vec2text gtr-base (projection + T5 enc/dec) — optional |
