cd /news/ai-agents/your-ai-agent-drifted-last-night-and… · home topics ai-agents article
[ARTICLE · art-23060] src=dev.to pub= topic=ai-agents verified=true sentiment=· neutral

Your AI Agent Drifted Last Night and You Didn't Notice

An engineer has identified three distinct patterns of "agent drift"—gradual degradation in AI agent output quality that occurs without hard failures or schema violations, often going unnoticed for days until a customer complains. The developer argues that the hardest production failures are not crashes but quality degradation between eval checkpoints, requiring continuous runtime detection rather than just pre-deployment testing. After running production agents 24/7, the engineer documented drift patterns including stale context from outdated knowledge bases and API caches, as well as behavioral shifts from silent model updates or prompt injection attempts.

read5 min publishedJun 6, 2026

Your agent passed every test in CI. It ran fine in staging. Then it quietly started returning subtly wrong answers in production at 2 AM, and nobody noticed until a customer complained three days later.

This is agent drift — the gradual degradation of agent output quality without any hard failures. No exceptions thrown. No schema violations. Just slowly worsening responses that slip past your monitoring.

Here's my thesis: the hardest production failures aren't crashes — they're quality degradation that happens between your eval checkpoints. You need continuous runtime detection, not just pre-deployment testing.

After running production agents 24/7, I've identified three distinct drift patterns:

Your agent retrieves documents, knowledge bases, or API responses as context. That context decays:

interface StalenessDetector {
  source: string;
  maxAgeMs: number;
  check: (context: RetrievedContext) => StalenessResult;
}

const stalenessChecks: StalenessDetector[] = [
  {
    source: 'knowledge-base',
    maxAgeMs: 24 * 60 * 60 * 1000, // 24 hours
    check: (ctx) => {
      const age = Date.now() - ctx.lastIndexedAt;
      const staleChunks = ctx.chunks.filter(c => 
        Date.now() - c.sourceLastModified > 7 * 24 * 60 * 60 * 1000
      );
      return {
        stale: staleChunks.length / ctx.chunks.length > 0.3,
        staleFraction: staleChunks.length / ctx.chunks.length,
        oldestChunkAge: Math.max(...ctx.chunks.map(c => Date.now() - c.sourceLastModified)),
        recommendation: staleChunks.length > 0 
          ? `${staleChunks.length} chunks older than 7 days` 
          : 'fresh'
      };
    }
  },
  {
    source: 'api-response-cache',
    maxAgeMs: 60 * 60 * 1000, // 1 hour
    check: (ctx) => {
      const cached = ctx.apiResponses.filter(r => r.fromCache);
      const expired = cached.filter(r => Date.now() - r.cachedAt > 60 * 60 * 1000);
      return {
        stale: expired.length > 0,
        staleFraction: expired.length / Math.max(cached.length, 1),
        recommendation: expired.length > 0
          ? `${expired.length} cached API responses expired`
          : 'fresh'
      };
    }
  }
];

The insidious part: stale context doesn't cause errors. Your agent happily generates confident answers based on outdated information. The output looks fine — it's just wrong.

The agent's response patterns shift over time. Maybe the underlying model got a silent update. Maybe prompt injection attempts are subtly reshaping behavior. Maybe token distributions are shifting due to accumulated conversation context.

interface DriftBaseline {
  dimension: string;
  expectedDistribution: { mean: number; stddev: number };
  windowSize: number;
}

class BehavioralDriftDetector {
  private baselines: Map<string, DriftBaseline> = new Map();
  private observations: Map<string, number[]> = new Map();

  observe(dimension: string, value: number): DriftAlert | null {
    const baseline = this.baselines.get(dimension);
    if (!baseline) return null;

    const window = this.observations.get(dimension) || [];
    window.push(value);
    if (window.length > baseline.windowSize) window.shift();
    this.observations.set(dimension, window);

    if (window.length < baseline.windowSize * 0.5) return null;

    const currentMean = window.reduce((a, b) => a + b, 0) / window.length;
    const zScore = Math.abs(currentMean - baseline.expectedDistribution.mean) 
      / baseline.expectedDistribution.stddev;

    if (zScore > 2.5) {
      return {
        dimension,
        severity: zScore > 4 ? 'critical' : 'warning',
        currentMean,
        expectedMean: baseline.expectedDistribution.mean,
        zScore,
        message: `${dimension} drifted ${zScore.toFixed(1)} sigma from baseline`
      };
    }
    return null;
  }
}

The key insight: you're not evaluating individual outputs. You're evaluating the distribution of outputs over time. A single long response means nothing. A gradual increase in average response length across 100 runs? That's signal.

Hallucination rates aren't constant. They vary with input complexity, context quality, and model state. The dangerous pattern: hallucination rate slowly climbs from 2% to 8% over a week, crossing your acceptable threshold without ever triggering a single hard failure.

interface HallucinationCanary {
  name: string;
  detect: (output: AgentOutput, groundTruth: GroundTruth) => HallucinationSignal;
}

const canaries: HallucinationCanary[] = [
  {
    name: 'entity-grounding',
    detect: (output, truth) => {
      const claimedEntities = extractEntities(output.raw);
      const groundedEntities = extractEntities(truth.sourceDocuments);
      const ungrounded = claimedEntities.filter(e => 
        !groundedEntities.some(g => semanticMatch(e, g, 0.85))
      );
      return {
        hallucinated: ungrounded.length > 0,
        ungroundedEntities: ungrounded,
        groundingRate: 1 - (ungrounded.length / Math.max(claimedEntities.length, 1))
      };
    }
  },
  {
    name: 'numeric-consistency',
    detect: (output, truth) => {
      const claimedNumbers = extractNumericClaims(output.raw);
      const sourceNumbers = extractNumericClaims(truth.sourceDocuments);
      const inconsistent = claimedNumbers.filter(claim =>
        !sourceNumbers.some(src => 
          src.entity === claim.entity && 
          Math.abs(src.value - claim.value) / src.value < 0.05
        )
      );
      return {
        hallucinated: inconsistent.length > 0,
        inconsistentClaims: inconsistent,
        consistencyRate: 1 - (inconsistent.length / Math.max(claimedNumbers.length, 1))
      };
    }
  }
];

Detection is one half. The other half is what you do about it. Here's the runtime loop I've converged on:

async function monitorAgentRun(run: AgentRun): Promise<MonitorResult> {
  // 1. Pre-execution: Check context freshness
  const stalenessResults = await checkStaleness(run.context);
  if (stalenessResults.some(r => r.stale)) {
    await refreshStaleContext(run, stalenessResults);
  }

  // 2. Post-execution: Lightweight drift check on every run
  const driftAlerts = trackDimensions(run.output, {
    responseLength: estimateTokens(run.output.raw),
    toolCalls: run.output.toolCallCount,
    latency: run.durationMs,
    confidenceProxy: run.output.metadata?.confidence ?? null
  });

  // 3. Sampled: Hallucination canary (expensive, run on 10% sample)
  let hallucinationResult = null;
  if (Math.random() < 0.1 && run.groundTruthAvailable) {
    hallucinationResult = await runCanaries(run.output, run.groundTruth);
  }

  // 4. Alert on threshold breach
  if (driftAlerts.some(a => a.severity === 'critical')) {
    await alertOncall('agent-drift-critical', driftAlerts);
  }

  return { stalenessResults, driftAlerts, hallucinationResult };
}

Notice the layering: staleness checks are pre-execution (you can fix stale context before the agent runs). Drift detection is post-execution and cheap (runs on every invocation). Hallucination canaries are expensive and sampled.

Three mistakes I made building this:

1. Alerting on individual outliers. An agent producing one long response isn't drift. I burned weeks chasing false positives before switching to windowed statistical detection.

2. Not versioning baselines. When you intentionally change agent behavior (new prompt, new model), your baselines need to reset. Otherwise every intentional improvement triggers drift alerts.

3. Treating hallucination as binary. "The agent hallucinated" is useless. What did it hallucinate? Entities? Numbers? URLs? The category determines the fix.

If you're running agents in production without drift monitoring, start here:

Most teams discover drift through customer complaints. By the time a user says "your AI gave me wrong information," you've likely been serving degraded responses for days or weeks.

The gap between "my agent works" and "my agent works reliably" is entirely about what happens between your evaluation checkpoints. Continuous monitoring isn't optional — it's the difference between running a demo and running a product.

How are you detecting drift in your production agents? Or are you still finding out from users? I'm curious what signals have been most useful for early detection.

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/your-ai-agent-drifte…] indexed:0 read:5min 2026-06-06 ·