Evals Are Alignment Enforcement: Why Your Safety Strategy Needs Runtime Checks A developer argues that evaluation infrastructure, not alignment research or product engineering, serves as the actual enforcement layer for AI safety in production. The developer proposes that safety must be treated as a runtime guarantee enforced through non-bypassable boundaries that check inputs, outputs, and trajectories for violations like credential leakage or system prompt exposure. The approach implements three concentric enforcement layers with hard invariants that block harmful outputs before they reach users, rather than relying on model fine-tuning or system prompts alone. The AI safety conversation is dominated by two camps: the alignment researchers thinking about existential risk, and the product engineers shipping features. Neither group talks enough about the middle layer — the actual enforcement mechanism that determines whether an agent behaves as intended in production. Here's my thesis: evaluation infrastructure is alignment enforcement. Not alignment research. Not safety theater. The actual enforcement layer that determines whether your agent does what you intended, at runtime, under adversarial conditions. If your agent can be jailbroken, produce harmful outputs, or violate its constraints — and you only find out from user reports — your evals aren't a testing tool. They're a missing safety system. Most teams treat safety as a property of the model. "We fine-tuned it to be safe." "We added a system prompt that says don't do bad things." This is wishful thinking dressed as engineering. Safety is a runtime guarantee , and runtime guarantees require runtime enforcement: interface SafetyBoundary { name: string; scope: 'input' | 'output' | 'trajectory'; check: data: BoundaryInput = BoundaryResult; action: 'block' | 'flag' | 'modify'; bypassable: false; // This is the point. } const boundaries: SafetyBoundary = { name: 'no-credential-leakage', scope: 'output', check: data = { const patterns = / ?:api - ?key|token|secret \\s:= + '" ? \\w- {20,}/gi, /-----BEGIN ?:RSA |EC ?PRIVATE KEY-----/g, / ?:ghp|gho|ghu|ghs|ghr A-Za-z0-9 {36,}/g ; const matches = patterns.flatMap p = ... data.output.matchAll p || ; return { safe: matches.length === 0, violations: matches.map m = { pattern: m 0 .substring 0, 20 + '...', position: m.index } }; }, action: 'block', bypassable: false }, { name: 'no-system-prompt-leakage', scope: 'output', check: data = { const systemPromptFragments = extractSignificantPhrases data.context.systemPrompt, { minLength: 15, topN: 20 } ; const leaked = systemPromptFragments.filter fragment = data.output.toLowerCase .includes fragment.toLowerCase ; return { safe: leaked.length < 3, violations: leaked.map l = { fragment: l } }; }, action: 'block', bypassable: false } ; The bypassable: false property is the entire philosophy. These aren't suggestions. They're invariants. I think about safety enforcement as three concentric layers, each catching different failure modes: Things that must never happen, regardless of input. These are your hardest constraints: class InvariantEnforcer { private invariants: SafetyBoundary ; private violationLog: ViolationRecord = ; async enforce agentOutput: AgentOutput : Promise