[AI] Context Engineering for AI Coding: AGENTS.md, Cursor Rules & RAG METR, an AI safety and capability research organization, conducted a randomized controlled trial in 2025 with 16 experienced open-source developers working on 246 real-world tasks. The study found that developers using AI tools were 19% slower on complex tasks, despite predicting a 24% speedup, due to context quality issues rather than model quality. The bottleneck is context quality, leading to the discipline of context engineering, which includes rule files like AGENTS.md to provide architectural constraints and behavioral standards for AI agents. In 2025, METR — an AI safety and capability research organization — ran a rigorous randomized controlled trial. Sixteen experienced open-source developers worked on 246 real-world tasks, each randomly assigned to either use AI coding tools freely or not at all. The result was counterintuitive: developers using AI tools were 19% slower on complex tasks. Before the study, those same developers predicted AI would make them 24% faster . After completing the experiment — still believing they had gone faster — their subjective confidence remained completely unshaken. The finding did not make headlines for the reason people assumed. The headline was not "AI is useless." The headline was this: the bottleneck is not model quality. It is context quality. The developers who slowed down were spending significant time on what researchers call "verification overhead" and "workflow friction" — the effort required to correct AI output that did not understand the architectural constraints, naming conventions, existing utility functions, and established patterns of the codebase they were working in. The AI was generating code. It was generating code for an imaginary system. This part of the series is about solving that problem. Series Orientation:This article is Part 2 of theAI Code Review & Vibe Codingseries, detailing the context engineering practices needed to align AI generation with codebase conventions. For the preceding guide on initial tools and non-technical vibe coding, see Part 1 — Vibe Coding & The Production Wall . Scope note:This article focuses specifically oncode-review-levelcontext engineering — the practices individual engineers and teams use to make AI agents produce reviewable, architecturally correct code on an existing codebase. If you are interested inplatform-levelcontext infrastructure — building an organizational AI Platform layer, internal RAG systems at scale, or enterprise knowledge management — see Context Engineering: Domain-Driven Design for AI in the AI-Driven Playbook series. Every time you open a new session with an AI coding tool, you begin from zero. The model knows nothing about: Without this information, the AI operates like a very fast, very confident junior developer who has never seen your codebase before and will reproduce whatever pattern was most common in its training data — not whatever pattern is correct for your system. Context engineering is the discipline of structuring and delivering organizational knowledge to AI agents in a form they can reliably use. It is, as the industry consensus now describes it, the "DevOps moment" for AI — the operational layer that separates experimental AI assistance from reliable production-grade AI collaboration. Modern AI coding environments support context at multiple layers. Understanding the hierarchy is the foundation of any effective context strategy. Rule files are plain-text configuration files that are automatically injected into every AI interaction. They are the most important and most underutilized form of context. AGENTS.md or CLAUDE.md / GEMINI.md These files — stored at the root of your repository — are read by AI agents before they begin any task. They function as the agent's standing orders: architectural constraints, behavioral standards, and explicit prohibitions that apply to everything the agent does. A well-structured AGENTS.md covers: Project Architecture This is a Kratos v2 microservice using Clean Architecture. Layer rules: - api/ = contracts only proto + generated code - internal/service/ = adapter layer only, no business logic - internal/biz/ = business logic, NO direct database calls - internal/data/ = persistence only, GORM + PostgreSQL Mandatory Standards - All context must propagate through function parameters - Use errgroup for managed goroutines only - SQL queries must use parameterized inputs — NEVER string concatenation - Secrets come from environment variables or Kratos Config — NEVER hardcode What NOT To Do - Do not use global state - Do not expose raw database errors to HTTP/gRPC responses - Do not create new patterns without checking internal/util first The specificity is the point. A generic instruction like "follow clean architecture" produces inconsistent results. A specific instruction like "the biz layer must never import gorm.DB directly" produces deterministic ones. Cursor Rules .cursorrules Cursor's rule files work similarly to AGENTS.md but are native to the Cursor IDE. They support scoped rules — you can define different behavior for different file patterns, enforce language-specific standards, and specify which files should never be modified by the AI. rules name = Go Microservice Standards glob = / .go security never hardcode secrets = true require parameterized queries = true forbid global state = true architecture enforce layer boundaries = true require context propagation = true The practical effect: your AI assistant now operates with your standards embedded, not as an afterthought you patch into every prompt. Consider a request: "Retrieve a user profile by email in the service layer." Without a Rule File AGENTS.md : The AI will write a GORM query directly inside the adapter service layer, bypassing Clean Architecture design: // File: internal/service/user.go func s UserService GetProfileByEmail ctx context.Context, req pb.GetProfileReq pb.GetProfileReply, error { var user biz.User // VIOLATION: Direct database access leaking into the service layer if err := s.db.WithContext ctx .Where "email = ?", req.Email .First &user .Error; err = nil { return nil, err } return &pb.GetProfileReply{Name: user.Name, Email: user.Email}, nil } With a Rule File AGENTS.md : The AI enforces layer isolation, routing GORM access exclusively through the persistence domain repository and business use case: // File: internal/service/user.go func s UserService GetProfileByEmail ctx context.Context, req pb.GetProfileReq pb.GetProfileReply, error { // CORRECT: Service calls the biz layer orchestrator UseCase user, err := s.userUseCase.FindByEmail ctx, req.Email if err = nil { return nil, err } return &pb.GetProfileReply{Name: user.Name, Email: user.Email}, nil } Even with rule files in place, long sessions degrade. This is the "context rot" phenomenon: as a session accumulates failed attempts, corrected errors, and discarded planning notes, the signal-to-noise ratio in the context window drops. The model may prioritize recent noise over foundational constraints. The Fresh Session Strategy High-performing engineering teams treat AI sessions like stateless functions: one distinct task per session. When you complete a bug fix, close the session. When you begin a new feature, open a fresh one. The operational rule: task boundaries are session boundaries. Structured Handovers When a session grows long before the task is complete, perform a structured handover: PLAN.md or HANDOVER.md file in your project directoryThis eliminates context rot while preserving all meaningful progress. Compaction Commands Modern coding agents Claude Code, Cursor include /compact or /summarize commands. Use them proactively when a session runs long — before the model hits its context limit and before performance degrades. A compacted summary is a much higher-quality input than an accumulating stream of raw conversation. Rule files establish standards. Session management controls noise. Repository indexing solves a different problem: giving the AI accurate knowledge of what already exists in your codebase. The N+1 Discovery Problem Without repository context, AI agents routinely implement functions that already exist. They create new database tables that duplicate existing ones. They define error types that collide with established patterns. They import packages that violate your dependency graph. Not because they are incapable of doing better — because they do not know what already exists. Manual Selection vs. Full-Repo Scanning Most AI coding tools offer the ability to scan an entire repository automatically. This sounds valuable and is often counterproductive. A large codebase injected wholesale into context adds significant noise — irrelevant files, outdated patterns, deprecated modules. The principle: manually select only the files directly relevant to the task. For a task modifying user authentication, the relevant context is: Not the entire codebase. Semantic Memory Banks More sophisticated teams maintain curated "memory bank" files — structured markdown documents that describe the codebase's architecture, key patterns, and important decisions in a form optimized for AI consumption: Memory Bank: Authentication Domain Architecture - Auth service handles JWT issuance and validation - User identity stored in PostgreSQL via GORM, users table - Sessions use Redis with 24h TTL see internal/data/session repo.go - MFA implemented via TOTP internal/service/mfa service.go Key Patterns - All auth errors return domain errors, never raw DB errors - Rate limiting is middleware-level internal/middleware/rate limiter.go - Refresh tokens are hashed before storage see HashToken in internal/util/crypto.go Common Mistakes to Avoid - Do NOT check password directly — always use bcrypt.CompareHashAndPassword - Do NOT log token values — only log token IDs - Do NOT implement new crypto — use internal/util/crypto.go exclusively These memory banks are updated when significant architectural decisions are made and committed to the repository alongside code. For large engineering organizations — those with hundreds of services, mature documentation, and complex architectural standards — static rule files are insufficient. The relevant context for any given task changes too rapidly and exists in too many places to manage manually. Retrieval-Augmented Generation RAG for code context works by: The operational result: an AI agent working on a payments feature automatically retrieves the relevant payment service interfaces, the ADR explaining why you chose the current transaction model, and the runbook for the payment provider integration — without the engineer manually curating that context. ADRs as Machine-Readable Judgment Architecture Decision Records deserve special attention. When committed in a structured format and indexed into a RAG pipeline, ADRs transform from static documentation into active constraints: ADR-047: Event-Sourcing for Order State Transitions Status: Accepted 2025-03 Context Direct state mutation of order records creates audit trail gaps and makes rollback scenarios complex. Decision All order state transitions are implemented as events, appended to the events table. The current state is derived by replaying events, not by direct column updates. Consequences - New order state logic MUST add new event types, NOT modify existing ones - Order queries require projection logic see internal/projection/order projector.go - Do NOT write directly to orders.status — always publish an OrderStateTransitioned event An AI agent with access to this ADR will not generate direct UPDATE orders SET status = ? queries for order state changes. Without it, it almost certainly will. MCP Servers as Context Infrastructure The Model Context Protocol MCP , released by Anthropic and now adopted across the industry, provides a standardized interface for serving context to AI agents. Rather than building bespoke integrations for each AI tool, organizations build MCP servers — lightweight services that expose specific organizational knowledge documentation, code patterns, ticket context through a standard protocol. The shift this enables: context infrastructure becomes a shared organizational asset rather than a per-engineer configuration problem. The industry now has a name for operating context infrastructure at organizational scale: ContextOps . The operational loop is: Ingest → Validate → Structure → Serve → Audit → Refine . Organizations that treat context as throwaway configuration — updated ad hoc, inconsistently formatted, stored in unindexed markdown files — experience the METR result: AI that slows teams down. Organizations that treat context as infrastructure — versioned, validated, monitored — experience meaningfully different outcomes. If your team does not have any context infrastructure today, the practical starting point is a three-step sequence: Step 1: Write an AGENTS.md one afternoon Focus on the highest-value content first: Step 2: Establish session discipline one team discussion Agree on task-based session boundaries. Add a compaction step to your team norms: before any session exceeds 20 substantive exchanges, compact and continue in a fresh session. Step 3: Build your first memory bank one sprint Pick your most critical domain — authentication, payments, whatever carries the highest risk. Document it in a memory bank format. Add a rule to your code review checklist: "Was the relevant memory bank file updated as part of this PR?" The marginal improvement from even basic context infrastructure is significant. Teams that complete these three steps report substantially fewer AI-generated PRs that violate architectural standards, require significant rework, or introduce security issues the memory bank explicitly prohibits. Consider a task: "Implement a new endpoint to export user transaction history as a CSV." Without context engineering , an AI agent will: users and transactions directly in the service layer internal/util/csv writer.go With effective context engineering , the same agent: csv writer.go and uses it rather than reimplementing internal/service/report service.go and applies it internal/middleware/ as specified in your AGENTS.mdThis is not a different model. It is the same model with correct context. The output difference is substantial. Context engineering is not a replacement for code review. It is a force multiplier on code review. When AI agents operate with accurate, comprehensive context, the output they produce: The result: human reviewers spend less time on pattern violations and architectural corrections, and more time on the genuinely high-value review tasks — logical correctness, edge case handling, and the security behaviors that require judgment rather than rule application. Part 3 covers what those high-value review tasks are: the full taxonomy of AI-generated bugs, from the ones automated tools catch to the ones that only careful human review finds. Next: Part 3 — AI Bug Taxonomy: From Silent Logic Failures to Slopsquatting This post was originally published on my blog at Context Engineering for AI Coding: AGENTS.md, Cursor Rules & RAG. Hi, I'm Lê Tuấn Anh vesviet 👋 I am a Senior Go Backend Architect & Distributed Systems Engineer with 17+ years of experience building high-traffic platforms 25M+ requests/month . If you enjoyed this deep-dive, let's connect on LinkedIn or explore my consulting services at tanhdev.com/hire.