Building ReefWatch, a Coral-Powered Production Triage Agent

A developer built ReefWatch, a Coral-powered production triage agent that discovers connected tools at runtime, queries them with read-only SQL, and generates incident reports only when the evidence supports a conclusion. The agent uses Coral as a data plane to abstract away individual tool SDKs, allowing it to investigate across systems like GitHub, Sentry, and Slack through a unified SQL interface. ReefWatch streams the evidence trail and produces operator-grade answers grounded in real system data rather than generic suggestions.

Production incidents almost never break in one place. The alert fires in one tool. The broken deploy is in Netlify. The suspicious change is in GitHub. The stack trace is in Sentry. The human context is in Slack. The runbook is in Notion. The "is this actually paging someone?" answer is in PagerDuty. A normal chatbot can sound helpful in that situation. It can say things like "you should check your recent deployments" and "look for related errors in Sentry." But that is not triage. That is a polished to-do list. I wanted something more useful: an agent that could go get the evidence, connect the dots across sources, show its work, and give an operator-grade answer grounded in real system data. The design constraint from the start was simple: no evidence, no answer. That became ReefWatch https://github.com/siiddhantt/reefwatch , a Coral-powered production triage agent built to It discovers the tools connected to a workspace at runtime, queries them as evidence, correlates records across systems, and produces a compact answer only when the facts support one. Coral became the backbone because it turns the messiest part of agent tooling into something the model can actually reason about: SQL . By the end of this route, you will have a blueprint for an agent that can: In one sentence: ReefWatch is a Coral-powered investigation workspace that lets an agent discover connected tools at runtime, query them with read-only SQL, stream the evidence trail, and generate an incident report only when the facts actually support one. MCP is excellent as an integration layer. It gives models a way to call tools with schemas instead of scraping humans through UI glue. But if every source becomes a separate collection of bespoke tools, a new problem appears: Coral changes the abstraction. It still uses MCP, but the agent mostly sees a small set of stable capabilities: discover catalog, inspect schema, read the guide, and run SQL. That means a new source is not: teach ReefWatch another SDK It becomes: php install a Coral source - discover the tables - query evidence with SQL The practical win is boring in the best way: ReefWatch can stay small. The app does not own GitHub pagination, Sentry auth, Slack table shapes, or Netlify deploy schemas. Coral owns that. ReefWatch owns the investigation behavior. That split also maps well to how I think reliable agents should be built: ground the model in real environment feedback, keep tools composable, trace the work, and wrap the loop with small guardrails instead of hoping one perfect prompt behaves forever. MCP gives the agent hands. Coral gives it a map and a query language. If I were rebuilding ReefWatch from scratch, I would not start with the UI. I would start with the investigation pipeline and make each layer earn its place. Remember, it is tempting to start with the surface, but you should make the surface reflect a system that was already worth trusting. The project came together in eight slices: | Slice | What I Built | Why It Mattered | |---|---|---| | 1 | Coral MCP client | Proved Coral could be the data plane | | 2 | Warm Coral session | Removed repeated MCP startup cost | | 3 | Schema context | Kept prompts aligned with live Coral metadata | | 4 | Minimal agent loop | Exposed the real model failure modes | | 5 | Policy modules | Made the agent reliable without hardcoding a demo | | 6 | Persistence | Made runs debuggable and conversations durable | | 7 | Streaming UI | Made the investigation inspectable | | 8 | Source profiles | Made setup reproducible without requiring every token | The final project shape looks roughly like this: a FastAPI backend, with SQLite store and React frontend reefwatch/ |-- src/ | |-- api/ | | |-- routes/ | | | |-- coral.py Coral health and source setup | | | |-- conversations.py persisted investigation threads | | | |-- investigations.py REST + SSE investigation runs | | | -- schema.py schema visibility for the UI | | |-- dependencies.py shared Settings, store, agent, session | | |-- mappers.py domain models to API responses | | -- schemas.py API contracts | -- app/ | |-- adapters/ | | |-- coral session.py long-lived Coral process + warm cache | | |-- mcp client.py JSON-RPC over coral mcp-stdio | | -- store.py SQLite run/conversation persistence | |-- agent/ | | |-- context.py conversation compression | | |-- coverage.py evidence lane policy | | |-- events.py streamable trace event contracts | | |-- guardrails.py evidence-first retries | | |-- intent.py structured artifact routing | | |-- execution policy.py duplicate and SQL-shape hygiene | | |-- loop.py LLM/tool loop | | |-- policy.py budgets and finalization | | |-- prompts.py schema-aware operating contract | | |-- schema.py Coral table/column context builder | | |-- source guidance.py compact source idiom hints | | |-- taxonomy.py source lanes and shared intent vocabulary | | |-- workflow.py coverage, correlation, and stop checkpoints | | -- synthesis.py optional incident report synthesis | |-- config.py centralized runtime knobs | |-- coral setup.py install/test Coral source profiles | -- source profiles.py triage, demo, enterprise profiles -- frontend/ -- src/ |-- components/chat/ chat surface, markdown, evidence trail |-- store/ conversation/run state -- api/ backend client That structure came from the order of problems I solved. The first backend slice was deliberately small. I wanted to answer one question: Can ReefWatch treat Coral as the source of operational truth? The first proof was: coral mcp-stdio . coral://tables . sql tool.At that point, ReefWatch was not an agent yet. It was a thin Coral client. That was useful because it proved the most important bet: the app could treat Coral as the data plane instead of building direct SDK integrations for The first reusable module was mcp client.py . It owns the boring but essential transport details: coral://guide and coral://tables Design decision:keep transport boring. Once mcp client.py worked, the rest of the app could stop thinking about processes and start thinking about investigations. The naive approach would be: php user asks question - spawn Coral - discover schema - ask model - run SQL That is fine for a script. It feels rough in a product . So the second slice was coral session.py . It keeps one Coral process alive, warms the schema/guide/tool cache, and recreates the process if it dies. That gave ReefWatch a cleaner runtime shape: php app starts - warm Coral once - investigations reuse the session The session cache stores three things: That one decision made the product feel different. Instead of every user prompt waiting on MCP bootstrapping and catalog discovery, ReefWatch starts from a warm map of the available sources. There is still a fallback path. If the process dies, CoralSession can recreate the client and warm the cache again. The MCP client reads and writes JSON-RPC over stdio with UTF-8 decoding, drains stderr on a background thread, and reports useful transport errors rather than hanging silently. A production triage agent that randomly waits on its own plumbing is not a production triage agent. Hardcoding source schemas would defeat the point of using Coral. The agent must discover what is installed right now. The temptation was to write a hand-authored prompt like: GitHub has these tables. Sentry has these tables. Slack uses channel ids. That would have made ReefWatch brittle and less Coral-native. This was one of those small choices where the architecture either respects the tool it is built on, or quietly works around it. Instead, ReefWatch builds its prompt context from Coral itself. It reads coral://tables , enriches the result with coral.columns , groups tables by source, and includes only a compact slice of each source in the prompt. SELECT schema name, table name, column name, data type FROM coral.columns ORDER BY schema name, table name, ordinal position The key idea:the model gets a map, not a maze. If the source catalog is small, the model sees most of it. If the catalog is large, the model gets enough to start and can use Coral discovery tools for the rest. That keeps the prompt useful without pretending the app has permanent knowledge of every source. Only after Coral transport and schema context worked did I build the agent loop. The first version of agent/loop.py had one job: php messages - LLM tool call - Coral SQL - tool result - final answer That version was intentionally plain. It let me see the raw failure modes: Those failures were useful. They showed which parts belonged in the prompt and which parts deserved code-level policy. A bad first agent run is not wasted time if it tells you where the system needs structure. This was the real turning point. I stopped trying to make one heroic system prompt do everything . Instead, I split agent behavior into focused modules: policy.py decides query budgets and finalization behavior. guardrails.py handles evidence-first retries and missing-source retries. coverage.py decides which evidence lanes matter for a request. workflow.py turns coverage and correlation into small checkpoint prompts. execution policy.py skips duplicate/noisy query shapes and catches table/function syntax mistakes before they hit Coral. context.py compresses conversation history. synthesis.py decides whether a structured report is appropriate.This was better than making one giant prompt because each module has a clear reason to exist and can be tested: | Failure Mode | Layer That Handles It | |---|---| | Agent answers without querying | guardrails.py | | Agent stops after one source | coverage.py | | Agent ignores missing evidence lanes | workflow.py | | Agent skips cross-source correlation | workflow.py | | Agent repeats query shapes | execution policy.py | | Agent loops too long | policy.py | | Conversation gets too large | context.py | | Report appears for ordinary questions | synthesis.py | The model still has agency. The code does not prescribe exact SQL for a demo scenario. The policy layer just keeps the model inside the kind of investigation a human operator would expect. The next slice was persistence. I started with SQLite because this is a proof-of-concept and local operator tool, not a multi-tenant SaaS backend. The important part was not Postgres. The important part was recording: That made debugging dramatically easier. When a run looked bad, I could inspect the exact queries and decide whether the failure was prompt, policy, schema, model, or source setup . This is also why the frontend can hydrate conversations and show evidence instead of keeping everything only in Redux memory. Only then did the chat UI become valuable. The UI was not designed as "talk to an AI." It was designed as an investigation workspace : That UI decision matters because Coral is visualizable. The user can see source counts, SQL queries, row counts, and the final synthesis. ReefWatch shows the route instead of hiding it behind one polished paragraph. The last piece was source profiles. I did not want the default setup to require every possible token. That creates a bad demo path. Instead, ReefWatch has profiles: | Profile | Sources | Use Case | |---|---|---| triage | GitHub, Sentry, Slack, Netlify | lightweight production triage | demo | triage + PagerDuty | richer incident response demo | security | GitHub, Slack, Notion, OSV | compliance/security route | enterprise | demo + Notion + OSV | default hackathon showcase | observability | demo + Datadog + StatusGator | deeper ops setup | This keeps the build reproducible. A reader can start with triage , get a real agent working, then add Notion/OSV/PagerDuty when they want a stronger story. The main loop is intentionally simple: The important part is not that the loop is complicated. It is that the loop is surrounded by small pieces of judgment. If the user asks an operational question like "what issues are on my GitHub?" and the model tries to answer without querying Coral, ReefWatch injects a retry message: "You have not queried Coral yet. Do not answer with table recommendations or ask for repo/org names until you first run metadata/source SQL queries to infer them."" This fixed the first embarrassing failure mode: the agent giving me instructions instead of doing the investigation. For production triage, one source is almost never enough. ReefWatch treats sources as evidence lanes: | Category | Sources | |---|---| | Ops | GitHub, Sentry, Netlify, Slack, PagerDuty, StatusGator | | Knowledge | Notion | | Security | GitHub, OSV, Notion, Slack | | Observability | Datadog | The policy does not say "always query everything." It checks what is actually installed and what the user asked. If the user asks specifically about GitHub, the coverage stays GitHub-scoped. If the user asks for production triage, the agent should cover the available ops lanes before finalizing. You have only checked GitHub, but Sentry and Netlify are available, so prefer those lanes next. That is the kind of judgment I wanted outside the model. The important refinement: coverage is a guide, not a cage . If the model just discovered the right Sentry project ID or hit a column error, it is allowed to inspect Coral metadata and correct that source query before moving on. That matters because real triage has tiny detours: Hard-blocking those detours made the agent worse. ReefWatch now nudges the investigation path without preventing useful schema correction. The source lane definitions and shared intent vocabulary live in taxonomy.py . That small file exists for a boring but important reason: coverage, budgets, and intent classification should not each carry their own slightly different definition of what "incident" means. The agent is still dynamic. taxonomy.py does not contain TraceChat queries, table names for a demo, or source-specific SQL recipes. It only describes the categories ReefWatch can reason about: Coral still discovers the actual tables, functions, filters, and columns at runtime. This was the final thing I tightened before the demo. Once multiple evidence lanes return concrete anchors, ReefWatch asks for a Coral-side correlation query instead of letting the model stitch everything together in prose. The preferred shape is: WITH deploy AS ... , errors AS ... , notes AS ... SELECT ... FROM deploy JOIN errors ON ... LEFT JOIN notes ON ... or, when the relationship is time-based instead of key-based: WITH deploy AS ... , errors AS ... , notes AS ... SELECT ... FROM deploy CROSS JOIN errors LEFT JOIN notes ON notes.ts <= errors.first seen WHERE errors.first seen = deploy.created at That checkpoint is still source-agnostic. It does not say "for TraceChat, run this SQL." It says: if the evidence exposes IDs, URLs, releases, commits, service names, channel IDs, or timestamps, prove the relationship inside Coral. If a correlation query fails because of SQL shape, the next instruction is not "give up." It is: This made ReefWatch feel much less like a chatbot and much more like an investigation workbench. One subtle failure: a model can run a query with a hallucinated timestamp column, get zero rows, and conclude "Slack had no evidence." That is bad triage. ReefWatch treats a filtered zero-row evidence query as not fully decisive until the model relaxes the filter or inspects the schema. A broad zero-row data query can satisfy a lane. A narrow zero-row query with extra WHERE filters cannot automatically close the book. That small distinction protects against false negatives without hardcoding Slack or any other source. Another failure mode showed up with quiet repositories. The model would discover the correct repo, then drift into global GitHub searches anyway. The fix was not "hardcode this repository" The fix was a general scope policy. If ReefWatch has discovered a concrete owner/repo , and the agent keeps running broad GitHub searches without repo:owner/repo , it nudges the agent back to scoped checks. This now lives as workflow guidance rather than a hard execution block. The point is the same: once a concrete anchor exists, prefer scoped evidence over another broad search, but still allow a corrective metadata query when the model needs to fix the route. The budget is not about limiting Coral. Coral SQL queries are cheap compared to LLM loops. The budget is about preventing agent drift and making the product predictable. ReefWatch uses different budgets by request type: When the budget is reached, the model must stop querying and produce the best evidence-backed answer it can, explicitly naming unknowns. The UI is conversational, but the product is not trying to become a general chat companion. The conversation flow exists for follow-up investigations: ReefWatch persists conversations and runs in SQLite. For the agent prompt, it builds a compact context from recent runs and SQL executions. If the message history gets too large, ContextWindow compresses older tool chatter into an execution summary and keeps the latest turns. That gives the model continuity without stuffing every old row into the prompt. The first version of ReefWatch used a small keyword policy to decide whether a run should produce an incident report. That was useful as a fallback, but it was too blunt for a real conversation. For example: What did it find on Slack? That follow-up might mention "incident chatter" or "deploy errors" in the answer, but the user did not ask for a new incident report. They asked for a source-specific explanation. The fix was a structured intent classifier . After the evidence answer is drafted, ReefWatch asks a lightweight structured LLM step to classify the artifact: answer only incident report audit report follow up The prompt is intentionally narrow. It classifies the current user request , not random words that appear in the answer draft or previous conversation context. There are still deterministic policy boundaries: report policy=never always disables reports report policy=always always enables an incident reportThis is the pattern I ended up liking most: let the model handle semantic intent, but keep product policy outside the model. Not every question deserves a report. If I ask "are there any open issues on my GitHub?", an incident report would be the wrong artifact. If I ask "investigate the production regression," a report is useful. The intent classifier decides the artifact. Report synthesis only runs when the mode is incident report . The structured synthesizer gets only the findings, SQL summary, and sources used. It has to stay grounded in the evidence already collected. If evidence is weak, it must lower confidence rather than invent a root cause. The UI is the best place to watch the investigation unfold. The CLI is the best place to prove the plumbing works. That split matters for a production agent. Before I ask the model to connect GitHub, Sentry, Netlify, Slack, PagerDuty, Notion, and OSV into one answer, I want a boring setup path that can validate each lane by itself. ReefWatch exposes that through reefwatch coral : uv run reefwatch coral doctor uv run reefwatch coral build uv run reefwatch coral install-profile uv run reefwatch coral test-source github uv run reefwatch coral test-source sentry uv run reefwatch coral test-source netlify uv run reefwatch coral test-source slack uv run reefwatch coral test-source pagerduty uv run reefwatch coral test-source notion uv run reefwatch coral sql "SELECT FROM pagerduty.abilities LIMIT 5" The important detail is that the CLI does not invent another integration layer. It uses the same Coral configuration and the same MCP transport that the agent uses. The difference is intent: the CLI is for setup, validation, and scripted investigations; the web workspace is for watching evidence appear and reading the final answer. For example, a teammate can run: uv run reefwatch investigate "Investigate the current production issue for tracechat-ledger and tell me what needs attention now." --trace That gives the project a second interface without splitting the product in two. Here is the practical route another developer can follow. Build Coral locally and point ReefWatch at the binary: git clone https://github.com/withcoral/coral.git cd coral cargo build Then configure ReefWatch: RW CORAL EXECUTABLE=../coral/target/debug/coral.exe RW CORAL REPO PATH=../coral RW CORAL CONFIG DIR=state/coral RW SOURCE PROFILE=enterprise The LLM I went for at the time of making and testing ReefWatch was DeepSeek v4 Pro https://openrouter.ai/deepseek/deepseek-v4-pro as it is quite a powerful model for agentic workflows and is very cost efficient for the amount of work it does. ReefWatch supports multi-modal LLM requests for the different stages, i.e inference, the main agent loop and the synthesis, so depending on your budget and use-case you can customise it Start with the sources that give the best incident story without too much setup: For the security/compliance variant, add: The important UX decision is profiles. ReefWatch does not force every source into every demo. It has triage , demo , security , enterprise , and observability profiles so the setup can match the story. Use a prompt that gives the agent enough intent but not a scripted path: Investigate the current production issue for tracechat-ledger and tell me what needs attention now. A good run should show: Quiet repos are harder than they look. A lazy agent says "no issues" after one empty query. A paranoid agent runs 30 searches and still sounds unsure. The ReefWatch answer I want is calmer: I did not find an active issue for <repo name . GitHub is checked-empty for open issues and PRs on that repository. I did not find linked deployment/runtime evidence in the installed sources. No incident report was generated because this looks like a quiet repository check, not an active production incident. That is the product philosophy in miniature: ReefWatch depends on Coral's core strengths: The agent does not just "call Coral once." Coral is the investigation substrate. The code is intentionally split: | Layer | Responsibility | |---|---| | MCP adapter | JSON-RPC over Coral stdio, UTF-8 safety, guide/resources/tools | | Coral session | Long-lived process and warm cache | | Schema model | Compact source/table/column context | | Prompt builder | Operating contract and live schema context | | Agent loop | LLM/tool loop and execution recording | | Policy | Budgets and finalization | | Coverage | Evidence lane requirements and source-level completeness | | Workflow | Coverage, correlation, correction, and stop checkpoints | | Taxonomy | Shared source lanes and investigation vocabulary | | Guardrails | Evidence-first and missing-source retries | | Context | Conversation compression | | Synthesis | Optional structured report | | API | Persistence and SSE streaming | ReefWatch does not hardcode "for tracechat, query these exact tables." It gives the model a source-agnostic investigation workflow, then lets Coral's live catalog expose the actual tables, functions, filters, and source idioms. Thanks for reading, if you've reached this part My teammate and I built ReefWatch for the Coral Hackathon. The experience taught me so much about building autonomous agents from scratch and shaping ReefWatch into a helpful tool. The most useful thing Coral gave ReefWatch was not just another integration. It gave the agent a way to move through operational data with a consistent mental model: php discover - inspect - query - correlate - report That is the difference between a chatbot that knows what tools exist and an agent that can actually investigate. ReefWatch is still a proof-of-concept, but the shape feels right: Coral handles the source layer, ReefWatch handles the investigation behavior, and the UI shows the route clearly enough that an operator can trust or challenge the answer. That is the kind of agent I wanted to build. Not a narrator. An investigator.