# Building ReefWatch, a Coral-Powered Production Triage Agent

> Source: <https://dev.to/siiddhantt/building-reefwatch-a-coral-powered-production-triage-agent-23hf>
> Published: 2026-05-30 06:43:00+00:00

Production incidents almost never break in one place.

The alert fires in one tool. The broken deploy is in Netlify. The suspicious

change is in GitHub. The stack trace is in Sentry. The human context is in

Slack. The runbook is in Notion. The "is this actually paging someone?" answer

is in PagerDuty.

A normal chatbot can sound helpful in that situation. It can say things like

"you should check your recent deployments" and "look for related errors in

Sentry."

**But that is not triage. That is a polished to-do list.**

I wanted something more useful: an agent that could go get the evidence, connect

the dots across sources, show its work, and give an operator-grade answer

grounded in real system data.

*The design constraint from the start was simple: no evidence, no answer.*

That became [ ReefWatch](https://github.com/siiddhantt/reefwatch), a Coral-powered production triage agent built to

It discovers the tools connected to a workspace at runtime, queries them as

evidence, correlates records across systems, and produces a compact answer only

when the facts support one.

Coral became the backbone because it turns the messiest part of agent tooling

into something the model can actually reason about: **SQL**.

By the end of this route, you will have a blueprint for an agent that can:

In one sentence:

ReefWatch is a Coral-powered investigation workspace that lets an agent discover connected tools at runtime, query them with read-only SQL, stream the evidence trail, and generate an incident report only when the facts actually support one.

MCP is excellent as an integration layer. It gives models a way to call tools

with schemas instead of scraping humans through UI glue.

But if every source becomes a separate collection of bespoke tools, a new

problem appears:

Coral changes the abstraction.

It still uses MCP, but the agent mostly sees a small set of stable capabilities:

discover catalog, inspect schema, read the guide, and run SQL.

That means a new source is not:

```
teach ReefWatch another SDK
```

It becomes:

``` php
install a Coral source -> discover the tables -> query evidence with SQL
```

The practical win is boring in the best way: **ReefWatch can stay small.**

The app does not own GitHub pagination, Sentry auth, Slack table shapes, or

Netlify deploy schemas.

**Coral owns that. ReefWatch owns the investigation behavior.**

That split also maps well to how I think reliable agents should be built:

ground the model in real environment feedback, keep tools composable, trace the

work, and wrap the loop with small guardrails instead of hoping one perfect

prompt behaves forever.

MCP gives the agent hands. Coral gives it a map and a query language.

If I were rebuilding ReefWatch from scratch, I would not start with the UI.

I would start with the investigation pipeline and make each layer earn its

place.

Remember, it is tempting to start with the surface, but you should make the surface reflect a system that was already worth trusting.

The project came together in eight slices:

| Slice | What I Built | Why It Mattered |
|---|---|---|
| 1 | Coral MCP client | Proved Coral could be the data plane |
| 2 | Warm Coral session | Removed repeated MCP startup cost |
| 3 | Schema context | Kept prompts aligned with live Coral metadata |
| 4 | Minimal agent loop | Exposed the real model failure modes |
| 5 | Policy modules | Made the agent reliable without hardcoding a demo |
| 6 | Persistence | Made runs debuggable and conversations durable |
| 7 | Streaming UI | Made the investigation inspectable |
| 8 | Source profiles | Made setup reproducible without requiring every token |

The final project shape looks roughly like this:

(a FastAPI backend, with SQLite store and React frontend)

```
reefwatch/
|-- src/
|   |-- api/
|   |   |-- routes/
|   |   |   |-- coral.py              # Coral health and source setup
|   |   |   |-- conversations.py      # persisted investigation threads
|   |   |   |-- investigations.py     # REST + SSE investigation runs
|   |   |   `-- schema.py             # schema visibility for the UI
|   |   |-- dependencies.py           # shared Settings, store, agent, session
|   |   |-- mappers.py                # domain models to API responses
|   |   `-- schemas.py                # API contracts
|   `-- app/
|       |-- adapters/
|       |   |-- coral_session.py      # long-lived Coral process + warm cache
|       |   |-- mcp_client.py         # JSON-RPC over coral mcp-stdio
|       |   `-- store.py              # SQLite run/conversation persistence
|       |-- agent/
|       |   |-- context.py            # conversation compression
|       |   |-- coverage.py           # evidence lane policy
|       |   |-- events.py             # streamable trace event contracts
|       |   |-- guardrails.py         # evidence-first retries
|       |   |-- intent.py             # structured artifact routing
|       |   |-- execution_policy.py   # duplicate and SQL-shape hygiene
|       |   |-- loop.py               # LLM/tool loop
|       |   |-- policy.py             # budgets and finalization
|       |   |-- prompts.py            # schema-aware operating contract
|       |   |-- schema.py             # Coral table/column context builder
|       |   |-- source_guidance.py    # compact source idiom hints
|       |   |-- taxonomy.py           # source lanes and shared intent vocabulary
|       |   |-- workflow.py           # coverage, correlation, and stop checkpoints
|       |   `-- synthesis.py          # optional incident report synthesis
|       |-- config.py                 # centralized runtime knobs
|       |-- coral_setup.py            # install/test Coral source profiles
|       `-- source_profiles.py        # triage, demo, enterprise profiles
`-- frontend/
    `-- src/
        |-- components/chat/          # chat surface, markdown, evidence trail
        |-- store/                    # conversation/run state
        `-- api/                      # backend client
```

That structure came from the order of problems I solved.

The first backend slice was deliberately small.

I wanted to answer one question:

**Can ReefWatch treat Coral as the source of operational truth?**

The first proof was:

`coral mcp-stdio`

.`coral://tables`

.`sql`

tool.At that point, ReefWatch was not an agent yet. It was a thin Coral client.

That was useful because it proved the most important bet: **the app could treat
Coral as the data plane** instead of building direct SDK integrations for

The first reusable module was `mcp_client.py`

.

It owns the boring but essential transport details:

`coral://guide`

and `coral://tables`

Design decision:keep transport boring. Once`mcp_client.py`

worked, the

rest of the app could stop thinking about processes and start thinking about

investigations.

The naive approach would be:

``` php
user asks question -> spawn Coral -> discover schema -> ask model -> run SQL
```

That is fine for a script.

It feels rough in a **product**.

So the second slice was `coral_session.py`

. It keeps one Coral process alive,

warms the schema/guide/tool cache, and recreates the process if it dies.

That gave ReefWatch a cleaner runtime shape:

``` php
app starts -> warm Coral once -> investigations reuse the session
```

The session cache stores three things:

That one decision made the product feel different.

Instead of every user prompt waiting on MCP bootstrapping and catalog discovery,

**ReefWatch starts from a warm map of the available sources.**

There is still a fallback path. If the process dies, `CoralSession`

can recreate

the client and warm the cache again.

The MCP client reads and writes JSON-RPC over stdio with UTF-8 decoding, drains

stderr on a background thread, and reports useful transport errors rather than

hanging silently.

A production triage agent that randomly waits on its own plumbing is not a production triage agent.

Hardcoding source schemas would defeat the point of using Coral.

**The agent must discover what is installed right now.**

The temptation was to write a hand-authored prompt like:

```
GitHub has these tables. Sentry has these tables. Slack uses channel ids.
```

That would have made ReefWatch brittle and less Coral-native.

*This was one of those small choices where the architecture either respects the
tool it is built on, or quietly works around it.*

Instead, ReefWatch builds its prompt context from Coral itself.

It reads `coral://tables`

, enriches the result with `coral.columns`

, groups

tables by source, and includes only a compact slice of each source in the

prompt.

```
SELECT schema_name, table_name, column_name, data_type
FROM coral.columns
ORDER BY schema_name, table_name, ordinal_position
```

The key idea:the model gets a map, not a maze.

If the source catalog is small, the model sees most of it.

If the catalog is large, the model gets enough to start and can use Coral

discovery tools for the rest.

That keeps the prompt useful without pretending the app has permanent knowledge of every source.

Only after Coral transport and schema context worked did I build the agent loop.

The first version of `agent/loop.py`

had one job:

``` php
messages -> LLM tool call -> Coral SQL -> tool result -> final answer
```

That version was intentionally plain.

It let me see the raw failure modes:

Those failures were useful.

They showed which parts belonged in the prompt and which parts deserved

code-level policy.

A bad first agent run is not wasted time if it tells you where the system needs structure.

This was the real turning point.

I **stopped trying to make one heroic system prompt do everything**.

Instead, I split agent behavior into focused modules:

`policy.py`

decides query budgets and finalization behavior.`guardrails.py`

handles evidence-first retries and missing-source retries.`coverage.py`

decides which evidence lanes matter for a request.`workflow.py`

turns coverage and correlation into small checkpoint prompts.`execution_policy.py`

skips duplicate/noisy query shapes and catches table/function syntax mistakes before they hit Coral.`context.py`

compresses conversation history.`synthesis.py`

decides whether a structured report is appropriate.This was better than making one giant prompt because each module has a clear

reason to exist and can be tested:

| Failure Mode | Layer That Handles It |
|---|---|
| Agent answers without querying | `guardrails.py` |
| Agent stops after one source | `coverage.py` |
| Agent ignores missing evidence lanes | `workflow.py` |
| Agent skips cross-source correlation | `workflow.py` |
| Agent repeats query shapes | `execution_policy.py` |
| Agent loops too long | `policy.py` |
| Conversation gets too large | `context.py` |
| Report appears for ordinary questions | `synthesis.py` |

The model still has agency. The code does not prescribe exact SQL for a demo

scenario.

The policy layer just **keeps the model inside the kind of investigation a
human operator would expect.**

The next slice was persistence.

I started with SQLite because this is a proof-of-concept and local operator

tool, not a multi-tenant SaaS backend.

The important part was not Postgres. The important part was recording:

That made debugging dramatically easier.

When a run looked bad, I could inspect the exact queries and decide whether the

failure was **prompt, policy, schema, model, or source setup**.

This is also why the frontend can hydrate conversations and show evidence

instead of keeping everything only in Redux memory.

Only then did the chat UI become valuable.

The UI was not designed as **"talk to an AI."**

It was designed as an **investigation workspace**:

That UI decision matters because Coral is visualizable.

The user can see source counts, SQL queries, row counts, and the final

synthesis.

ReefWatch shows the route instead of hiding it behind one polished

paragraph.

The last piece was source profiles.

I did not want the default setup to require every possible token. That creates a

bad demo path.

Instead, ReefWatch has profiles:

| Profile | Sources | Use Case |
|---|---|---|
`triage` |
GitHub, Sentry, Slack, Netlify | lightweight production triage |
`demo` |
triage + PagerDuty | richer incident response demo |
`security` |
GitHub, Slack, Notion, OSV | compliance/security route |
`enterprise` |
demo + Notion + OSV | default hackathon showcase |
`observability` |
demo + Datadog + StatusGator | deeper ops setup |

This keeps the build reproducible.

A reader can start with `triage`

, get a real agent working, then add

Notion/OSV/PagerDuty when they want a stronger story.

The main loop is intentionally simple:

The important part is not that the loop is complicated.

It is that the loop is surrounded by small pieces of judgment.

If the user asks an operational question like "what issues are on my GitHub?"

and the model tries to answer without querying Coral, ReefWatch injects a retry

message:

*"You have not queried Coral yet. Do not answer with table recommendations or ask
for repo/org names until you first run metadata/source SQL queries to infer them.""*

This fixed the first embarrassing failure mode: the agent giving me instructions

instead of doing the investigation.

For production triage, one source is almost never enough.

ReefWatch treats sources as evidence lanes:

| Category | Sources |
|---|---|
| Ops | GitHub, Sentry, Netlify, Slack, PagerDuty, StatusGator |
| Knowledge | Notion |
| Security | GitHub, OSV, Notion, Slack |
| Observability | Datadog |

The policy does not say "always query everything."

It checks what is actually installed and what the user asked. If the user asks

specifically about GitHub, the coverage stays GitHub-scoped. If the user asks

for production triage, the agent should cover the available ops lanes before

finalizing.

*You have only checked GitHub, but Sentry and Netlify are available, so prefer those lanes next.*

That is the kind of judgment I wanted outside the model.

The important refinement: coverage is a **guide, not a cage**.

If the model just discovered the right Sentry project ID or hit a column error,

it is allowed to inspect Coral metadata and correct that source query before

moving on. That matters because real triage has tiny detours:

Hard-blocking those detours made the agent worse. ReefWatch now nudges the

investigation path without preventing useful schema correction.

The source lane definitions and shared intent vocabulary live in `taxonomy.py`

.

That small file exists for a boring but important reason: **coverage, budgets,
and intent classification should not each carry their own slightly different
definition of what "incident" means.**

The agent is still dynamic. `taxonomy.py`

does not contain TraceChat queries,

table names for a demo, or source-specific SQL recipes. It only describes the

categories ReefWatch can reason about:

Coral still discovers the actual tables, functions, filters, and columns at

runtime.

This was the final thing I tightened before the demo.

Once multiple evidence lanes return concrete anchors, ReefWatch asks for a

Coral-side correlation query instead of letting the model stitch everything

together in prose.

The preferred shape is:

```
WITH deploy AS (...),
     errors AS (...),
     notes AS (...)
SELECT ...
FROM deploy
JOIN errors ON ...
LEFT JOIN notes ON ...
```

or, when the relationship is time-based instead of key-based:

```
WITH deploy AS (...),
     errors AS (...),
     notes AS (...)
SELECT ...
FROM deploy
CROSS JOIN errors
LEFT JOIN notes ON notes.ts <= errors.first_seen
WHERE errors.first_seen >= deploy.created_at
```

That checkpoint is still source-agnostic. It does not say "for TraceChat, run

this SQL." It says: **if the evidence exposes IDs, URLs, releases, commits,
service names, channel IDs, or timestamps, prove the relationship inside Coral.**

If a correlation query fails because of SQL shape, the next instruction is not

"give up." It is:

This made ReefWatch feel much less like a chatbot and much more like an

investigation workbench.

One subtle failure: a model can run a query with a hallucinated timestamp column,

get zero rows, and conclude "Slack had no evidence."

**That is bad triage.**

ReefWatch treats a filtered zero-row evidence query as not fully decisive until

the model relaxes the filter or inspects the schema.

A broad zero-row data query can satisfy a lane. A narrow zero-row query with

extra `WHERE`

filters cannot automatically close the book.

That small distinction protects against false negatives without hardcoding

Slack or any other source.

Another failure mode showed up with quiet repositories.

The model would discover the correct repo, then drift into global GitHub

searches anyway.

The fix was not "hardcode this repository"

The fix was a **general scope policy.**

If ReefWatch has discovered a concrete `owner/repo`

, and the agent keeps

running broad GitHub searches without `repo:owner/repo`

, it nudges the agent

back to scoped checks.

This now lives as workflow guidance rather than a hard execution block. The

point is the same: once a concrete anchor exists, prefer scoped evidence over

another broad search, but still allow a corrective metadata query when the model

needs to fix the route.

The budget is not about limiting Coral.

Coral SQL queries are cheap compared to LLM loops.

The budget is about preventing agent drift and making the product predictable.

ReefWatch uses different budgets by request type:

When the budget is reached, the model must stop querying and produce the best

evidence-backed answer it can, explicitly naming unknowns.

The UI is conversational, but the product is not trying to become a general chat

companion.

The conversation flow exists for follow-up investigations:

ReefWatch persists conversations and runs in SQLite.

For the agent prompt, it builds a compact context from recent runs and SQL

executions. If the message history gets too large, `ContextWindow`

compresses

older tool chatter into an execution summary and keeps the latest turns.

That gives the model continuity without stuffing every old row into the prompt.

The first version of ReefWatch used a small keyword policy to decide whether a

run should produce an incident report.

That was useful as a fallback, but it was too blunt for a real conversation.

For example:

```
What did it find on Slack?
```

That follow-up might mention "incident chatter" or "deploy errors" in the

answer, but the user did not ask for a new incident report. They asked for a

source-specific explanation.

The fix was a **structured intent classifier**.

After the evidence answer is drafted, ReefWatch asks a lightweight structured

LLM step to classify the artifact:

`answer_only`

`incident_report`

`audit_report`

`follow_up`

The prompt is intentionally narrow. It classifies the **current user request**,

not random words that appear in the answer draft or previous conversation

context.

There are still deterministic policy boundaries:

`report_policy=never`

always disables reports`report_policy=always`

always enables an incident reportThis is the pattern I ended up liking most: let the model handle semantic

intent, but keep product policy outside the model.

Not every question deserves a report.

If I ask "are there any open issues on my GitHub?", an incident report would be

the wrong artifact.

If I ask "investigate the production regression," a report is useful.

The intent classifier decides the artifact. Report synthesis only runs when the

mode is `incident_report`

.

The structured synthesizer gets only the findings, SQL summary, and sources

used.

It has to stay grounded in the evidence already collected. If evidence is weak,

it must lower confidence rather than invent a root cause.

The UI is the best place to watch the investigation unfold.

The CLI is the best place to prove the plumbing works.

That split matters for a production agent. Before I ask the model to connect

GitHub, Sentry, Netlify, Slack, PagerDuty, Notion, and OSV into one answer, I

want a boring setup path that can validate each lane by itself.

ReefWatch exposes that through `reefwatch coral`

:

```
uv run reefwatch coral doctor
uv run reefwatch coral build
uv run reefwatch coral install-profile
uv run reefwatch coral test-source github
uv run reefwatch coral test-source sentry
uv run reefwatch coral test-source netlify
uv run reefwatch coral test-source slack
uv run reefwatch coral test-source pagerduty
uv run reefwatch coral test-source notion
uv run reefwatch coral sql "SELECT * FROM pagerduty.abilities LIMIT 5"
```

The important detail is that the CLI does not invent another integration layer.

It uses the same Coral configuration and the same MCP transport that the agent

uses. The difference is intent: the CLI is for setup, validation, and scripted

investigations; the web workspace is for watching evidence appear and reading

the final answer.

For example, a teammate can run:

```
uv run reefwatch investigate "Investigate the current production issue for tracechat-ledger and tell me what needs attention now." --trace
```

That gives the project a second interface without splitting the product in two.

Here is the practical route another developer can follow.

Build Coral locally and point ReefWatch at the binary:

```
git clone https://github.com/withcoral/coral.git
cd coral
cargo build
```

Then configure ReefWatch:

```
RW_CORAL_EXECUTABLE=../coral/target/debug/coral.exe
RW_CORAL_REPO_PATH=../coral
RW_CORAL_CONFIG_DIR=state/coral
RW_SOURCE_PROFILE=enterprise
```

The LLM I went for at the time of making and testing ReefWatch was [DeepSeek v4 Pro](https://openrouter.ai/deepseek/deepseek-v4-pro) as it is quite a powerful model for agentic workflows and is very cost efficient for the amount of work it does.

ReefWatch supports multi-modal LLM requests for the different stages, i.e inference, the main agent loop and the synthesis, so depending on your budget and use-case you can customise it!

Start with the sources that give the best incident story without too much setup:

For the security/compliance variant, add:

The important UX decision is profiles.

ReefWatch does not force every source into every demo. It has `triage`

, `demo`

,

`security`

, `enterprise`

, and `observability`

profiles so the setup can match

the story.

Use a prompt that gives the agent enough intent but not a scripted path:

```
Investigate the current production issue for tracechat-ledger and tell me what
needs attention now.
```

A good run should show:

Quiet repos are harder than they look.

A lazy agent says "no issues" after one empty query. A paranoid agent runs 30

searches and still sounds unsure.

The ReefWatch answer I want is calmer:

```
I did not find an active issue for <repo name>.

GitHub is checked-empty for open issues and PRs on that repository. I did not
find linked deployment/runtime evidence in the installed sources. No incident
report was generated because this looks like a quiet repository check, not an
active production incident.
```

That is the product philosophy in miniature:

ReefWatch depends on Coral's core strengths:

The agent does not just "call Coral once."

**Coral is the investigation substrate.**

The code is intentionally split:

| Layer | Responsibility |
|---|---|
| MCP adapter | JSON-RPC over Coral stdio, UTF-8 safety, guide/resources/tools |
| Coral session | Long-lived process and warm cache |
| Schema model | Compact source/table/column context |
| Prompt builder | Operating contract and live schema context |
| Agent loop | LLM/tool loop and execution recording |
| Policy | Budgets and finalization |
| Coverage | Evidence lane requirements and source-level completeness |
| Workflow | Coverage, correlation, correction, and stop checkpoints |
| Taxonomy | Shared source lanes and investigation vocabulary |
| Guardrails | Evidence-first and missing-source retries |
| Context | Conversation compression |
| Synthesis | Optional structured report |
| API | Persistence and SSE streaming |

ReefWatch does not hardcode "for tracechat, query these exact tables."

It gives the model a source-agnostic investigation workflow, then lets Coral's

live catalog expose the actual tables, functions, filters, and source idioms.

Thanks for reading, if you've reached this part!

My teammate and I built ReefWatch for the Coral Hackathon. The experience taught me so much about building autonomous agents from scratch and shaping ReefWatch into a helpful tool.

The most useful thing Coral gave ReefWatch was not just another integration.

It gave the agent a way to move through operational data with a consistent

mental model:

``` php
discover -> inspect -> query -> correlate -> report
```

That is the difference between a chatbot that knows what tools exist and an

agent that can actually investigate.

ReefWatch is still a proof-of-concept, but the shape feels right: Coral handles

the source layer, ReefWatch handles the investigation behavior, and the UI shows

the route clearly enough that an operator can trust or challenge the answer.

That is the kind of agent I wanted to build.

Not a narrator.

An investigator.
