My agent was 'succeeding' on Slack while silently doing nothing — here's the monitoring stack that caught it

wpnews.pro

cd /news/developer-tools/my-agent-was-succeeding-on-slack-whi… · home › topics › developer-tools › article

[ARTICLE · art-30377] src=dev.to ↗ pub=2026-06-17T01:12Z topic=developer-tools verified=true sentiment=· neutral

My agent was 'succeeding' on Slack while silently doing nothing — here's the monitoring stack that caught it

A developer discovered that their Slack bot was reporting success while failing to perform actual work, caught by a monitoring stack that treats Slack, D1 logging, and cost tracking as a connected pipeline. The fix involved a lean schema with a tokens_used column to track agent-specific costs, revealing that an analysis tool's monthly Claude spend was $38 instead of the estimated $20. The developer also identified MCP stdio transport timeouts and Worker Slack notification termination as hidden failure modes.

read2 min views26 publishedJun 17, 2026

My Slack bot was firing success messages while D1 row count sat completely frozen. The agent wasn't crashing — it was completing its Slack notification step and skipping all the actual work. Alerts alone would never have caught this.

The fix was treating Slack, D1 logging, and cost tracking as one connected pipeline instead of three separate tools. The schema I landed on is deliberately lean — 7 columns, with tokens_used

being the one that matters most:

CREATE TABLE agent_runs (
  id          TEXT PRIMARY KEY,
  agent_name  TEXT NOT NULL,
  run_at      INTEGER NOT NULL,
  status      TEXT NOT NULL,  -- 'ok' | 'error' | 'timeout'
  duration_ms INTEGER,
  tokens_used INTEGER,
  error_msg   TEXT
);

Without tokens_used

in D1, your cost dashboard only tells you what you spent today — not which agent burned it. I was mentally estimating my analysis tool's monthly Claude spend at around $20. The actual number was $38. I had no idea until I ran the aggregation query against the logs.

The other thing that bit me: MCP stdio transport timeouts. I spent days convinced it was an OAuth misconfiguration. It wasn't. One tool was calling an external API that occasionally took 30+ seconds. The stdio transport's read timeout was shorter than that, and Workers' 30-second wall clock limit made it worse — the subprocess got force-killed with a misleading exit code 1 error. The timeout

status rows piling up in D1 are what actually surfaced the pattern. Without that log, it would have stayed a vague "sometimes slow" complaint.

On the Workers side: if you fire Slack notifications with a plain await

instead of wrapping in ctx.waitUntil

, the fetch gets killed the moment your Worker returns a response. That's why Webhooks seem to fail randomly — they're not failing, they're being terminated. Also worth knowing: Slack returns a 400 if your payload exceeds 4,000 characters, and raw error stack traces will blow past that limit silently.

I wrote up the full breakdown — including the KV write explosion that hit 12K writes in 2 minutes and how I traced it back to a specific agent using D1, plus the Durable Object refactor for slow tools — over on riversealab.com.

source & further reading

dev.to — original article Designing a Practical MiniMax H3 Video Workflow: Text, Frames, and Omni References I gave my Cursor agent real tools without five API keys Aeglix Mind

~/api · this article 200

$curl api.wpnews.pro/v1/news/my-agent-was-succeeding-…

Read original on dev.to → dev.to/riversea/my-agent-was-succeeding-on-slack…

mentioned entities

Slack

Workers

MCP

Claude

riversealab.com

metadata

slugmy-agent-was-succeeding-on-slack-while-silently-doing-nothing-here-s-the-stack

topic#developer-tools

secondary2 topics

sentimentneutral

canonicaldev.to

navigation

← prevAI expert joins team advocating …

next →Introducing the Agentic CDP: A N…

── more in #developer-tools 4 stories · sorted by recency

dev.to · 2 Aug · #developer-tools

I gave my Cursor agent real tools without five API keys

news.ycombinator.com · 2 Aug · #developer-tools

Ask HN: I still don't understand why AI agents need "skills"

pub.towardsai.net · 2 Aug · #developer-tools

The Day I Stopped Babysitting My AI and Started Building Loops

dev.to · 30 Jul · #developer-tools

My Auto-Publish Pipeline Shipped a Two-Year-Old News Story. Here's the Fix — All Three Layers of It.

── more on @slack 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 1 Aug · #ai-agents

Quality Isn't Accidental — Maker/Checker Separation and Automated Validation

wpnews · 1 Aug · #developer-tools

I Built a Portable AI Skill That Safely Upgrades .NET Applications

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required