Letting Claude Code Autonomously Hunt for Trading Strategies AlphaForge, a new CLI-first backtesting tool designed for AI agents, enables autonomous trading strategy exploration. The tool emits structured JSON for every command and includes built-in walk-forward validation to prevent overfitting. An MCP server allows Claude Code to drive the full pipeline—ideation, backtest, optimize, walk-forward—overnight without human babysitting. If you've spent any time backtesting trading strategies, you've probably run into both of these: Problem 1: Overfitting is embarrassingly easy. Most backtesting tools will happily show you a 40% CAGR strategy that falls apart the moment it touches unseen data. The backtest looked great because you — consciously or not — optimised in-sample and called it done. Walk-forward validation exists to catch this, but it's tedious to wire up manually, so most people skip it. Problem 2: Existing quant tools are impossible for an AI agent to drive. Web-UI backtesting platforms have no CLI surface. Raw Python frameworks are powerful but their APIs are wide and stateful — asking Claude Code to "explore strategies overnight" means the agent would have to parse Python tracebacks, infer what broke, and mutate code files in a loop. That's fragile. It also means you need to babysit it. I built AlphaForge https://alforgelabs.com to solve both at once. Most tools add a --json flag as an afterthought. AlphaForge was designed from the start around the assumption that the primary user might be an AI agent, not a human. alpha-forge system describe This emits a full JSON catalog of every subcommand, its parameters, accepted values, and expected output shape. An agent calls this once at session start and instantly knows the entire API surface — no doc-scraping, no prompt engineering to guess flag names. Every command accepts --json and returns a stable envelope: alpha-forge backtest run CL=F --strategy cl momentum v1 --json { "run id": "bt 20260621 a3f9", "status": "ok", "result": { "sharpe": 0.94, "cagr": 0.121, "max drawdown": -0.183, "wft windows positive": 4, "wft windows total": 5 }, "next steps": "optimize", "walk forward", "export pine" } Structured error envelopes with error code , message , and suggested fix mean the agent can handle failures without parsing human-readable text. The run id lets the agent reference results later without re-running anything. uvx alpha-forge-mcp alpha-forge-mcp https://github.com/alforge-labs/alpha-forge-mcp is an Apache-2.0 MCP server that wraps the CLI. Drop it into your Claude Code mcp servers config and AlphaForge's commands become first-class tools in any MCP-compatible agent. Note: The MCP server is in alpha. The core CLI is the stable interface; MCP is the layer we're hardening next. AlphaForge ships Claude Code slash commands and Codex skills out of the box. The explore skill codifies the full pipeline — ideation → backtest → optimize → walk-forward — as a reusable, version-controlled workflow rather than a throw-away chat transcript. This is the part that made me realise something had shifted. There's no magic one-shot explore command — the agent runs the loop, using AlphaForge's bundled explore-strategies skill to drive the CLI. From inside Claude Code with the MCP server running: "Explore energy futures strategies overnight. Backtest each combination of MACD, RSI, and ATR on CL=F and WTI. Walk-forward validate anything with Sharpe 0.8. Log the results." The agent picks up the system describe catalog, runs backtests and optimizations via --json , reads structured results, prunes losers early, and writes a ranked summary to disk. You wake up to a shortlist, not a pile of charts to eyeball. A backtest without out-of-sample validation is just curve-fitting with extra steps. AlphaForge runs walk-forward testing — alpha-forge optimize walk-forward — on an optimized strategy: the in-sample window trains, the out-of-sample window tests, and you want a majority of OOS windows positive before a strategy is considered viable. There's also optimize sensitivity , which perturbs the optimized parameters to flag how fragile overfit they are. The explore loop uses WFT as its filter. Strategies that look great in-sample but fail OOS are discarded automatically — the agent doesn't have to make that judgment call. I want to be concrete without being misleading, so here's the one result I'll cite, with full context. An equal-weight basket combining a hedged 3× NASDAQ-100 sleeve SMA200 + ATR sizing + GLD + TLT showed: Disclaimer required reading : Past results don't guarantee future returns. These figures include 0.05% per-trade slippage and use price return data only. This is a backtest, not live trading. I'm not showing this to claim the strategy is "proven." I'm showing it because it illustrates what WFT-validated diversification looks like in AlphaForge's output format — and because hiding it felt like its own kind of dishonesty. The full workflow from idea to Pine Script v6 export — every step speaks --json , so an agent can chain them deterministically: 1. Describe the CLI agent onboarding alpha-forge system describe 2. Backtest a strategy defined in JSON alpha-forge backtest run SPY --strategy spy sma rsi v1 --json 3. Optimize parameters with Optuna TPE alpha-forge optimize run SPY --strategy spy sma rsi v1 --json 4. Walk-forward validate out-of-sample alpha-forge optimize walk-forward SPY --strategy spy sma rsi v1 --json 5. Export to TradingView Pine Script v6 alpha-forge pine generate --strategy spy sma rsi v1 Each --json result carries a run id / result id and a next step hint, so the agent always knows what to call next without re-running anything. AlphaForge is in public beta. The free trial is ungated — run backtests locally, see the output format, decide if it fits your workflow. uvx alpha-forge-mcp AlphaForge is designed for people who'd rather have an agent find strategy candidates than spend weekends manually tuning parameters. If that's you, the trial will tell you faster than this article whether it fits. Strategies, API keys, and trade history stay on your machine. Only license verification touches the network. Zenn JP 版について: 同内容の日本語記事を Zenn に掲載しています。AlphaForge はエージェントネイティブなバックテスト CLI で、Claude Code や MCP 経由で AI エージェントが戦略探索を自律実行できます。ウォークフォワード検証をデフォルトで適用し、過学習を構造的に抑止する設計です。詳細は alforgelabs.com https://alforgelabs.com をご覧ください。