Giving an AI Agent Write Access to Your App: Guardrails We Built for RobinReach's MCP Tools

wpnews.pro

A few months ago I wrote about building a production MCP server in Rails, the plumbing of exposing RobinReach's API as a set of MCP tools that Claude and other agents can call.

That post was about connecting an AI agent to your app. This one is about the harder problem: what happens once it's connected and can actually do things, like publish to a client's Instagram, reply to a comment on their behalf, or schedule a week of content. The moment an agent has write access, "it works in the demo" stops being good enough.

The single question every user (and every one of our customers' customers) eventually asks is some version of: "can this thing accidentally touch something it shouldn't?" Specifically, on a platform that manages multiple brands for multiple clients, can the AI agent working on Brand A ever see or post to Brand B?

The answer is no, and the reason why is the part I want to focus on, because it's a different kind of guardrail than the usual "we told the AI not to" approach.

The easy way to build this would be: give the agent one set of credentials for the whole account, list every brand the user has access to, and then add an instruction like "always check which brand you're working on and never act on the wrong one."

That approach technically works, right up until it doesn't. LLMs make mistakes. They mix up context across a long conversation, they reuse an ID from three messages ago, they occasionally just hallucinate. If the only thing standing between "agent posts to Brand A" and "agent posts to Brand B" is a sentence in a prompt telling it to be careful, that's not a guardrail. That's a hope.

So we built it differently. The connector that the agent talks to is scoped at the API/auth layer, not the prompt layer. When the integration is set up, the credentials issued for that connection are tied to a specific account and a specific set of brands the user actually has access to. Every tool call the agent makes gets validated against that scope on the server, before it ever touches a database row.

What this means in practice:

The reason this matters so much is that it moves the guarantee from "the AI is well behaved" to "the infrastructure makes the bad outcome impossible." Those are very different sentences to say to a customer. One is a promise about behavior. The other is a statement about architecture. If you're building anything where an AI agent has access to multiple tenants, customers, or brands, this is the line I'd draw first, before writing a single line of prompt instructions.

Scoping solves "is this the right brand." It doesn't solve "is this the right content." That's where validation comes in, and we treat it as a hard, separate step.

validate_post

is a required call that happens before create_post

, and the agent is instructed it must never skip it. We deliberately did not fold validation into the create step itself, even though that would be simpler. Splitting it forces a "draft, check, fix" loop instead of "fire and see what happens."

What gets checked:

If something fails, the agent gets a structured response back describing exactly what's wrong, and it can correct the content before anything goes near a real social account. In practice this catches the most common failure mode by a wide margin: an agent writing one great LinkedIn post and then naively reusing the same text as a tweet, which is both too long and the wrong tone. A few more things worth a sentence each, because together they form the full picture:

Draft by default. Anything the agent generates proactively lands as a draft, not scheduled or published. Scheduling or publishing only happens when the user actually asks for that outcome. This gives the agent a safe "here's what I made, take a look" state instead of a binary publish or don't.

Audience aware scheduling. Before scheduling anything, the agent pulls the audience's actual best performing times for that brand and platform, rather than picking a "reasonable sounding" time itself. Left alone, an LLM tends to pick suspiciously round numbers like 9am or noon, because those are common in training data, not because that's when this brand's followers are online.

Voice learned from feedback. Whenever a user edits or rejects generated content, that correction is saved and applied automatically next time. The agent is told to apply it silently, so the output just sounds like the brand without the user re-explaining preferences every session.

Comments are surfaced, not auto handled. The agent can read and reply to comments, but it always shows the user who said what, on which post, and flags anything that looks like a complaint before drafting a response. Replying as the brand to a real customer is high stakes enough that a human stays in the loop.

No raw API leaks into the conversation. Tool names, JSON, internal IDs, none of that reaches the user. Everything is translated into plain language, like "your Facebook page Acme Co has 3 new comments" instead of a payload with internal identifiers. This sounds cosmetic but it's actually a trust guardrail. The moment a non technical user sees a raw error or an ID, the illusion that they're talking to "their social media manager" breaks, and they become more cautious about giving the tool any access at all.

Across all of this, the theme is the same. Don't try to make the agent smarter. Make the wrong action structurally harder to take than the right one, and put the hardest boundaries where the cost of a mistake is highest.

For us, that meant brand and tenant isolation enforced at the auth layer, where the agent has no technical ability to even ask for the wrong thing, and content validation enforced as a separate required step, where mistakes get caught before they go live. Everything else, voice, scheduling, comment handling, is built on top of those two foundations. MCP makes it trivially easy to hand an LLM the keys to your app. The interesting engineering work is making sure some of those keys don't open every door.

source & further reading

dev.to — original article Is GitHub Copilot Worth It? Who It Pays Off For (and Who Can Skip It) When AI agents hit an impossible task, they hide it. The 3 escape patterns and the rules that stop them The Changelog Habit That Keeps Five RAXXO Tools Honest

Giving an AI Agent Write Access to Your App: Guardrails We Built for RobinReach's MCP Tools

Run your AI side-project on zahid.host