cd /news/ai-safety/claude-fable-5-the-harness-matters-m… · home topics ai-safety article
[ARTICLE · art-31874] src=endorlabs.com ↗ pub= topic=ai-safety verified=true sentiment=· neutral

Claude Fable 5: The harness matters more than the model

Anthropic's Claude Fable 5 model, when paired with the Cursor agent harness, achieved 72.6% FuncPass and 29% SecPass on 200 real-world vulnerability-fixing tasks, topping the fair leaderboard. The same model under Claude Code scored only 59.8% FuncPass and 19% SecPass, demonstrating that the agent scaffolding, not the model itself, drives security outcomes. Despite the improvement, even the best combination leaves roughly 70% of functionally correct patches still vulnerable.

read12 min views1 publishedJun 17, 2026

We benchmarked Claude Fable 5 again, this time paired with the Cursor agent, on the same 200 real-world vulnerability-fixing tasks. The model that landed mid-table under Claude Code now tops our fair leaderboard: 72.6% FuncPass and 29% SecPass. The story here is not the model, it is the harness.

This is the companion piece to Claude Fable 5: Mythos-grade hype, record cheating, and a few hall-of-fame entries, where the same model with Claude Code returned an average scorecard (59.8% FuncPass, 19.0% SecPass). Reading the two together is the point: the agent scaffolding wrapped around a frontier model can move security outcomes more than the model choice itself.

Key takeaways

A new #1 on our leaderboard. Cursor + Fable 5 reached72.6% FuncPass and29% SecPass after our anti-cheating and strict-test adjustments, the highest fair SecPass of any model-and-harness combination we have tested on the 200-instance setThe harness, not the model, drives the gap. The same Fable 5 model is +12.8pp FuncPass and +10pp SecPass under Cursor versus Claude Code. The difference is dominated by patchquality, not extra time or infrastructure, and Cursor seems specifically better at steering the model toward the security dimension of a task.Cheating is still high, and still memorization. We confirmed cheating on 29 instances, again dominated by training recall (28).Five hall-of-fame firsts. Cursor + Fable 5 solved five security instances that no other model-and-agent combination has ever cracked.Still a lot of room for SecPass improvement. Even the best combo remains below 30% SecPass, meaning roughly seven out of ten functionally correct AI-generated patches still leave the vulnerability open.

Introduction #

Fable 5 arrived with high expectations: Anthropic positioned it as a generally available, safeguarded Mythos-class model built for long, complex work, with strong reported performance across software engineering, cybersecurity, and long-horizon tasks.

Our first look at the model, through Claude Code, did not match that promise on the Agent Security League. It was not bad, but it was not a breakout either: 59.8% FuncPass and 19.0% SecPass after fair scoring.

So we ran the same model again through a different harness: Cursor. The result changes the story, but it does not make the story cheerful. Cursor + Fable 5 becomes the strongest SecPass result we have measured so far, and still lands below 30%: roughly seven out of ten AI-generated patches that work still leave the vulnerability open.

Still, that makes this a useful stress test for a question we keep seeing in the benchmark: how much of "model capability" is really the model, and how much is the agent scaffold wrapped around it?

Benchmark recap #

Our approach is described in detail in our whitepaper. Here is a short version to recall some key points.

On this benchmark, we measure combos, combinations of a harness (Cursor, Claude Code, ...) and a frontier model (Fable 5, GPT-5.5, Gemini 3.5, ...), on coding tasks inside real, complex projects. The combo is not told that the missing code is security-critical; it is only instructed to follow security best practices while writing code.

We run each combo once per task and apply its predicted patch in an isolated Docker environment. FuncPass means the patch passes the functional tests the combo could use during development. SecPass means it also passes the hidden security tests introduced by the original vulnerability fix, so a secure result must first be functionally correct.

We also require the combo to solve the task using its own reasoning: recovering the known fix from git history, the web, or similar sources is treated as cheating. We made this explicit in the prompt after observing many cases of agents retrieving patches from git history or web search. On top of that, our anti-cheating pipeline runs post-hoc checks for suspicious behavior in the trajectory and for high similarity between the combo's patch and the known fix, with LLM adjudication for flagged cases.

One additional wrinkle is that part of the dataset contains overly strict security tests: tests that expect implementation details that are almost impossible to guess independently, such as a complex exception string copied from the reference patch. Passing those tests can itself be a cheating signal, so we use them as traps to surface additional cheating strategies and feed new signals back into the anti-cheating pipeline. Confirmed cheating is removed, and overly strict / unfeasible instances are excluded from the denominator to produce the** fair** scores we use in our leaderboard.

Memorization (also referred to as training recall) has become a major issue with more advanced models (e.g., Opus 4.8, Composer 2.5, ...), and we saw it on many task instances in last week's Claude Code with Fable 5 experiment. Memorization is the subtle case: unlike git-history inspection or web lookups, which prompt instructions can largely suppress, the model may simply know the upstream fix from training data.

In normal software engineering, that is not inherently wrong. Human developers also reuse what they have seen before. But this benchmark is designed to measure whether a combo can reason from the local codebase, not whether the model has already seen the answer. For that reason, we treat confirmed training recall as cheating and exclude those instances from the fair metrics. The signals we rely on are not generic similarities, but artifacts that cannot be derived from the workspace: long upstream comments reproduced verbatim, changelog annotations, and even CVE/CWE identifiers that appear nowhere in the task or codebase.

Results: Fable 5 goes from middling to first #

Cursor + Fable 5's 29% fair SecPass is the best result on our leaderboard to date — ahead of the previous front-runners (Codex with GPT-5.5 at 22.3%, Cursor with GPT-5.5 at 24.0%). The same Fable 5 weights that looked unremarkable in the first experiment are, under a different agent, our strongest security performer.

Placed against the same model under Claude Code, the contrast is stark:

Cheating remains high, but lower than in the Claude Code run: 29 confirmed cases vs 38. Almost all of it is memorization / training recall (28 of 29), where the model reproduces artifacts from the upstream fix, verbatim comments, CVE numbers, changelog annotations, or even a reference-patch typo, that cannot be derived from the workspace.

Beyond the headline rates, Cursor + Fable 5 also entered the hall of fame in our leaderboard by solving five security instances that no previous model-and-agent combination had cracked.

Side note: we ran this experiment last week, only a few days after Fable 5 was released. It finished on Jan. 12, 2026, just before Fable 5 was banned. During the run, Cursor served a mix of "thinking" and "no-thinking" Fable 5 variants, likely because it was still tuning the integration. We did not observe any cyber/bio fallback from Fable 5 to Opus 4.8, but we did not test fallback observability enough to rule it out with certainty. Given the strength of the results, however, fallback to Opus 4.8 does not look like the likely explanation.

It's the agent harness, not the model #

How can the same model produce such different numbers? We decomposed the head-to-head results instance by instance, and the answer is not the obvious one.

It is mostly patch quality. Of the 34 instances Cursor solved for FuncPass that Claude Code did not (after cheating adjustment), the majority were cases where Claude Code did produce a substantive patch, it just was not correct enough. Only a smaller slice was lost to timeouts, empty predictions, or outright failures.

Cursor also seems to steer the model closer to the security bug. Of the 25 instances only Cursor solved for SecPass, 13 were cases where Claude Code passed FuncPass but failed SecPass: same model, same task, working code from both sides, but only one patch closed the vulnerability. We audited those 13 before treating them as evidence. The pattern was consistent: Claude usually understood the broad vulnerability class, but Cursor's patch was more complete.

Sometimes that meant a better check: catching an input variant Claude missed, such as an HTTP request-smuggling conflict, a http:///

open-redirect form, or newline handling in shell-command normalization. More often, it meant putting the fix everywhere it needed to go: validating at construction time instead of later, checking an enumeration prefix, scrubbing trust IDs on every API response path, forcing dangerous SVGs to download, or escaping form-field help text before rendering it.

Below are three examples from the head-to-head comparison: two hall-of-fame solves, and one ordinary-but-instructive LangChain case that shows the same completeness pattern.

That does not mean Cursor is magically better at every sanitizer. We also ran the mirror comparison (Claude Code SecPass, Cursor FuncPass but SecFail) and found three cases, all places where Cursor's individual check was incomplete or ordered incorrectly. So the sharper takeaway is not "Cursor writes better sanitizers." It is that, on this run, Cursor won more of the divergent security cases, and its wins tended to be about completeness: either covering the tricky input form, or reaching the sink Claude left unguarded.

We checked the usual alternative explanations, and they are secondary. Claude Code did time out on some tasks in the broader experiment, and Cursor solved several of those, so timeouts explain part of the headline gap. But in the most interesting security-quality cases above, Claude Code had already produced a working patch and ended voluntarily; it was not cut off. Refusals are smaller still: the three tasks Claude Code never submitted account for only a small part of the functional gap and essentially none of this direct SecPass story.

The majority of the meaningful gap sits on instances where Claude Code had the time and the green light, but produced a weaker patch.

Example 1 (hall of fame): Wagtail

This case maps to CVE-2020-15118 / CWE-79 (cross-site scripting). It involved Wagtail's form builder. Form fields can have help_text, which is later rendered back into HTML. The secure behavior is subtle: unless a project explicitly opts into allowing HTML help text

, that value must be escaped before it reaches the page.

Claude Code rebuilt the form-field options and passed the functional tests, but treated help_text

as just another value to copy through:

'help_text': field.help_text

That is exactly the vulnerable behavior. A lower-privileged editor could store markup in a form field's help text

and have it rendered as HTML later.

Cursor's patch instead treated help_text as an untrusted output. It imported Django's escaping primitive and applied it when building the field options:

options['help_text'] = conditional_escape(field.help_text)

The trajectories line up with the code. Both agents looked at the same Wagtail form-builder files, but Cursor explicitly searched for HELP_TEXT_ALLOW_HTML|conditional_escape|help_text;

Claude's trajectory mentions help_text, but not escaping, XSS, sanitization, or safe rendering. Same feature, same model, but only one harness got the model to ask the security question: where will this string be displayed?

Example 2 (hall of fame): OpenStack Aodh

This instance is tagged CVE-2017-12440 / CWE-306 (missing authentication for a critical function — Aodh accepting client-supplied Keystone trust IDs). Both agents implemented that core CVE fix**:** they create, reuse, and delete trust IDs correctly and reject client-supplied trust IDs. Functionally, both patches were substantial.

What separated them was a related follow-up hardening. Once Aodh stores an action URL containing an internally-generated trust ID, it should not echo that ID back in API responses (an information-exposure issue). Only Cursor addressed it. Claude Code missed that output boundary. Its endpoints returned alarms through the regular serializer:

return Alarm.from_db_model(alarm)

That meant get, put, post, and get_all

could return URLs containing the embedded trust ID, effectively leaking a credential-like secret back to clients.

Cursor added a scrubbed serializer and used it on every response path:

return Alarm.from_db_model_scrubbed(alarm)

That distinction captures the broader pattern. Claude solved the lifecycle problem, but not the data-flow problem. Cursor recognized that the sensitive value had one more place to go, out through the API, and protected that sink too.

Example 3: LangChain

This case maps to CVE-2024-3571 / CWE-22 (path traversal) in LangChain's local file store. The store accepts user-controlled keys and maps them to files under a configured root directory. The security invariant is simple: no key should resolve outside that root, even if it is absolute or contains traversal tricks.

Both agents implemented the core containment check. They validated keys by resolving the full path and checking that it stayed under the root, using the same basic idea as the reference fix:

full_path = os.path.abspath(os.path.join(self.root_path, key))if os.path.commonpath([self.root_path, full_path]) != self.root_path:    raise InvalidKeyException(...)

That protected the obvious read, write, and delete paths, because mget

, mset

, and mdelete

all routed keys through the guarded helper. The miss was the listing path: yield_keys(prefix). Cursor treated the prefix as a key too, so it sent it through the same validation before walking the directory:

prefix_path = self._get_full_path(prefix) if prefix else self.root_path

Claude's final patch did not. It walked the root and used the untrusted prefix only as a string filter:

if prefix is None or relative_path.startswith(prefix):    yield relative_path

So a malicious prefix such as /etc/passwd

did not raise an exception; it simply produced no matches. Functionally, that can look harmless. Security-wise, it means one public entry point skipped the containment check.

The trajectory makes the example even sharper. Claude actually wrote the secure version in its first draft, with yield_keys

routing the prefix through _get_full_path

. That write was rejected by the tool because the file had not been read yet. Claude then re-authored the file and silently dropped the validation from yield_keys

. Cursor kept the guard on that path. This is not a case where one model knew about path traversal and the other did not; it is a case where one harness preserved the security invariant through the whole edit loop.

What's next?

When you're ready to take the next step in securing your software supply chain, here are 3 ways Endor Labs can help:

── more in #ai-safety 4 stories · sorted by recency
── more on @anthropic 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/claude-fable-5-the-h…] indexed:0 read:12min 2026-06-17 ·