cd /news/ai-research/needle-in-a-haystack-measuring-the-i… · home topics ai-research article
[ARTICLE · art-18221] src=blog.calif.io pub= topic=ai-research verified=true sentiment=· neutral

Needle in a haystack: measuring the impact of two nginx RCEs

Two critical heap buffer overflow vulnerabilities in nginx's rewrite engine, CVE-2026-42945 and CVE-2026-9256, allow attackers to trigger remote code execution when specific directive combinations cause the script engine to miscalculate buffer sizes. Researchers at Calif.io built a static vulnerability scanner called ngxray and scanned 35,633 nginx configurations from GitHub to determine how many real-world deployments actually contain the vulnerable patterns. The analysis found that the bugs require uncommon configuration combinations involving flagless rewrite directives followed by set or if statements, making exploitation rare in practice.

read12 min publishedMay 29, 2026

Two critical CVEs, 35633 configs scraped from GitHub, and a question: does anyone actually write nginx configs that trigger these bugs?

We had a lot of fun hacking nginx earlier this year. We know from experience that finding a real RCE in nginx is hard, especially one that triggers in a default or commonly-used configuration.

So when F5 disclosed CVE-2026-42945 (better known as nginx-rift

) and CVE-2026-9256 (possibly nginx-poolslip

), two critical heap buffer overflows in the nginx rewrite engine, the natural question was: how many real-world configurations are actually vulnerable?

To answer that, we built ngxray, a static vulnerability scanner for nginx configs, and pointed it at GitHub.

The bugs #

Both CVEs are heap buffer overflows in nginx's rewrite-phase script engine. They're distinct bugs, but they share a root cause: the engine sizes a buffer in one pass and fills it in another. A heap overflow arises when certain directive combinations cause the two passes to disagree on how much space is needed.

CVE-2026-42945: the stale flag

When a rewrite

replacement contains ?

, the script engine compiles a call to ngx_http_script_start_args_code, which sets

e->is_args = 1

. This flag tells the capture-copy function to URI-escape data: +

becomes %2B

, a 3x size increase.When the rewrite finishes, regex_end_code resets

e->quote

but, before the fix, did not reset e->is_args

:

e->quote = 0;
// e->is_args = 0;  <-- missing before the fix

If the rewrite has no flag (last

, break

, redirect

, permanent

), the engine continues to the next directive with the stale flag still set.

This creates three distinct overflow scenarios, depending on what comes after the flagless rewrite.

The set case. A subsequent

set $var $1

invokes . This function creates a zeroed sub-engine for the length pass:

ngx_http_script_complex_value_code()

ngx_memzero(&le, sizeof(ngx_http_script_engine_t));  // le.is_args = 0

It measures the buffer at raw capture length. But the copy pass runs through the main engine e

where e->is_args = 1

, so ngx_http_script_copy_capture_code applies

ngx_escape_uri

and writes up to 3x more than the buffer holds.

location ~ ^/api/(.*)$ {
    rewrite ^/api/(.*)$ /internal?migrated=true;
    set $original_endpoint $1;    # $1 copied with stale is_args=1
}

This is the variant described in the original nginx-rift

report.

The if case. The mechanism here is identical to the previous case, albeit with a different syntax. Both funnel the captured argument (eg

$1

) through . The

ngx_http_rewrite_value()

set

handler calls it on the assigned value, and the if

-condition handler calls it on the right-hand side of the comparison.

When that argument contains a variable, the function emits a ngx_http_script_complex_value_code

, with its zeroed length sub-engine and stale-is_args

copy pass. This is the exact vulnerable code path discussed in the set

case.

location ~ ^/api/(.*)$ {
    rewrite ^/api/(.*)$ /internal?migrated=true;
    if ($request_method = $1) {    # $1 on the right-hand side hits the same bug
        return 204;
    }
}

Not all if

operators are affected. The =

and !=

comparisons send the right-hand side through ngx_http_rewrite_value()

, the same path set

uses, as do the -f

/-d

/-e

file tests when applied to a capture. The regex operators (~

, ~*

, !~

, !~*

) instead compile it as a regular-expression pattern, a different code path that never builds the mismatched buffer. So if ($uri ~* $1)

is safe, while if ($request_method = $1)

is not.

As with the set

case, the if

must appear after the rewrite in source order. If it runs first, is_args

is still 0 and nothing overflows.

One thing worth noting: if{}

blocks in nginx's rewrite module compile into the same code array as the parent location. A rewrite inside an if{}

block and a set

outside it still execute in the same engine run. The is_args

flag leaks across the if

boundary.

The rewrite-chain case. The stale flag can also overflow inside a second rewrite's own replacement. The first rewrite (with ?

and no flag) sets e->is_args = 1

and continues. The second rewrite enters regex_start_code, which before the hardening fix did not reset

is_args

.When the second rewrite has no named variables in its replacement (only $1

, $2

, etc.), regex_start_code

takes a fast path for the length calculation. This fast path doesn't use a sub-engine at all. It computes the buffer size inline, adding each capture's raw byte count directly. Because is_args

was not reset at the top of the function, the stale flag from the first rewrite is still alive on the main engine e

.

The copy pass then calls ngx_http_script_copy_capture_code

for each $N

. That function checks e->is_args

, sees it's 1, and applies ngx_escape_uri

. The length pass measured raw bytes, but the copy pass writes escaped bytes. This results in the same mismatch as the set

case, just inside a different code path.

location / {
    rewrite ^/(.*)$ /stage/$1?x=1;               # sets is_args, no flag
    rewrite ^/stage/(.*)$ /destination/$1 break;  # $1 sized raw, copied escaped
}

This variant is harder to trigger in practice because the URI produced by the first rewrite must actually match the second rewrite's regex. If the first rewrites to /index.php

and the second expects ^/admin/(.*)

, they'll never chain.

In all three cases, the request must contain bytes that expand under URI escaping (like +

becoming %2B

) in the captured portion. The escaping is gated on e->request->quoted_uri || e->request->plus_in_uri. Without escapable characters, the size/copy mismatch is zero and no overflow occurs.

CVE-2026-9256: the budget undercount

This one lives in the fast path of regex_start_code, which handles rewrites where the replacement has no named variables. Before the

fix, the length calculation budgeted escape space once over the entire URI:

e->buf.len += 2 * ngx_escape_uri(NULL, r->uri.data, r->uri.len,
                                  NGX_ESCAPE_ARGS);

Then it added each capture's raw byte count. But when capture groups are nested, like ^/((.*))$

, $1

and $2

cover the same URI bytes. The copy pass escapes those bytes once per $N

reference, exceeding the budget.

rewrite ^/((.*))$ http://backend/$1$2 redirect;

The rewrite must trigger URI escaping (redirect

, permanent

, http://...

, or ?

in the replacement), and the replacement must reference positional captures whose groups contain each other.

Scraping GitHub #

Unfortunately, GitHub doesn't have a "give me all nginx configs" button. nginx configurations can be found not just in .conf

files, but also inside Dockerfiles, shell heredocs, Jinja2 templates, ERB, Puppet manifests, Kubernetes ConfigMaps, Helm values, and Markdown documentation. A naive search for filename:nginx.conf

misses most of the surface area.

Our collector runs over 100 distinct GitHub Code Search queries:

Direct configs:

language:Nginx

, filenames likenginx.conf

anddefault.conf

, paths underconf.d/

andsites-available/

Template formats:

.j2

,.erb

,.tmpl

,.mustache

Embedded configs: Dockerfiles with

COPY

or heredocs writing to/etc/nginx

, Kubernetes YAML with nginx ConfigMap dataDocumentation: Markdown and RST with fenced nginx code blocks

Each query is paginated up to GitHub's 10-page limit. Results are deduplicated by content hash. When the collector encounters a Dockerfile, it follows COPY

sources back into the same repository to fetch the referenced config files. We made every part of the run resumable, because GitHub's rate limits mean you'll hit a wall eventually.

The raw downloads then pass through an extraction pipeline that separates the nginx config from the wrapper content surrounding it, and strips out any unsupported features, like Jinja templates.

What comes out the other end are clean .conf

files that an nginx parser can actually tokenize. The final corpus: 35,633 parseable nginx configurations from thousands of GitHub repositories.

Parsing with nginx's own tokenizer #

The parser/

directory in ngxray contains a standalone C program that compiles nginx's actual tokenizer (ngx_conf_read_token

and ngx_conf_parse

from src/core/ngx_conf_file.c

) against a patched handler. We patched ngx_conf_handler()

to log and output the parsed syntax tree:

ngx_int_t
conf_handler(ngx_conf_t *cf, ngx_int_t last)
{
    // Records every directive into a JSON syntax tree
    // instead of dispatching to nginx modules
    node = conf_node_create(tree, cf);
    conf_node_append(tree->current, node);
    ...
}

By reusing nginx's tokenizer, we avoid reinventing the wheel, while ensuring our scanner's results match real world observations.

The rule engine #

The scanner loads vulnerability signatures from JSON rule files. Each rule specifies which directives to match, structural constraints, and semantic checks specific to the vulnerability.

For CVE-2026-42945, max_args: 2

enforces the no-flag requirement. A flagged rewrite has 3 args (regex, replacement, flag), so any rewrite with more than 2 args is safe. ordered: true

ensures the rewrite appears before the set

in source order.

For CVE-2026-9256, the overlapping_refs

check does actual PCRE parsing. It maps each $N

reference in the replacement back to its capture group's position in the regex, then checks whether any two referenced groups physically contain each other. not_regex: "\\$[a-zA-Z_]"

ensures no named variables appear, which would force the slow path.

We wrote rules covering both CVEs: three variants of CVE-2026-42945 (the set

, if

, and rewrite-chain cases) and CVE-2026-9256. Each rule carries embedded test cases that the scanner validates on every run with python3 scan.py --test

.

Results #

The scanner flagged configs across several dozen repositories. The majority turned out to be PoC reproductions, scanner test fixtures, and tutorial snippets.

After triage, the hits fell into four buckets:

One real vulnerable config. point/cassea, a PHP MVC framework, ships an nginx vhost config with a language-routing rewrite chain. Here's the relevant section of the location /

block:

set $controller index;
rewrite '^([^\.?&]*[^/])([?&#].*)?$' $1/$2;
rewrite '^/([a-z]{2})(/.*)$' $2?__lang=$1;          # <-- sets is_args, no flag
rewrite '^(.*)/([?&#].*)?$' $1/index.xml$2;

if ($uri ~* '^/([^/\.]{3,})(/.*)$') {
    set $controller $1;                               # <-- $1 copied with stale is_args
}

The language rewrite on line 3 strips a two-letter prefix like /en/...

and appends ?__lang=en

. It has no flag, so the script engine continues with e->is_args = 1

. The if

block below it extracts a controller name from the rewritten URI. The set $controller $1

inside that if

runs through complex_value_code

with the stale flag.

The question is whether $1

inside the if

can contain escapable characters. The if

regex is '^/([^/\.]{3,})(/.*)$'

, where the first capture group matches three or more characters that aren't /

or .

. That includes +

.

A request to /en/++++++++++++++++++++++++/whatever

passes through the language rewrite (stripping /en

), producing /++++++++++++++++++++++++/whatever?__lang=en

. The if

regex then matches, capturing ++++++++++++++++++++++++

into $1

. The set

sizes the buffer at 24 raw bytes, but the copy pass escapes each +

to %2B

, writing 72 bytes.

We built a minimal reproduction and ran it in Docker against nginx compiled with AddressSanitizer:

==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x511000001b48
SUMMARY: AddressSanitizer: heap-buffer-overflow src/core/ngx_string.c:1689 in ngx_escape_uri

The project itself is abandoned: a PHP5 framework last updated in 2011, 3 stars, zero forks, homepage offline. As far as we can tell, nobody is running this specific config. But the pattern it uses, language prefix stripping via flagless rewrite with ?

, is a legitimate design that someone could independently arrive at.

Documentation and tutorials. A handful of repos contained the vulnerable pattern inside Markdown exercise files and blog posts. Anyone who copies these snippets into a real config inherits the bug. One recurring example is an image-processing tutorial:

rewrite ^/images/([a-z]{2})/([a-z0-9]{5})/(.*)\.(png|jpg|gif)$ /data?file=$3.$4;
set $image_file $3;

Two Chinese-language nginx tutorial repos had this pattern. We confirmed it crashes with a request to /images/en/ab12c/+++...+++.jpg

, where $3

captures the plus signs and the stale is_args

does the rest.

PoC and lab environments. About a dozen repos were intentional CVE reproductions: nginx-rift-private-lab

, CVE-2026-42945

, cve-2026-42945-nginx32-lab

, and so on. These all use the standard /api/(.*)

trigger from the original advisory. They're doing exactly what they're supposed to do.

Scanner test fixtures. Four repos were test cases for other nginx linting tools, with files named vulnerable.conf

and bad.conf

.

The chain variant

The rewrite-chain variant deserves separate mention, because it shows how the triage pipeline works.

The scanner produced 29 raw matches. Then the filters kicked in:

| Stage                              | Count |
|------------------------------------|-------|
| Raw chain-rule matches             | 29    |
| After `$scheme://` redirect filter | 28    |
| After literal-prefix filter        | 7     |
| After manual review                | 0     |

The $scheme://

filter catches rewrites where the replacement starts with http://

or $scheme

. These are implicit redirects, so nginx returns a 3xx and stops processing. No chaining occurs.

The literal-prefix filter compares the first rewrite's output URI against the second rewrite's regex: if the first rewrites to /index.php

and the second requires ^/admin/ads/edit/

, they can't chain.

The remaining 7 findings all had second regexes starting with a capture group, which the scanner can't rule out statically. Manual review killed all of them. One config rewrites to /journo

but the second regex requires ^/([a-zA-Z0-9]+-...)/rss$

, and /journo

has no -

or /rss

suffix. Another rewrites to /index.php

but the second regex is ^/@(\w+)/(following|followers)

, and /index.php

doesn't start with /@

.

What this means #

We are living through the first AI Bugmageddon, and it has produced a lot of noise alongside real findings. We've contributed to some of that noise ourselves, so we are not in a position to judge anyone. But that's exactly why this kind of triage matters: defenders need to know which CVEs apply to their infrastructure and which ones they can deprioritize.

In this instance, the bugs are real and exploitable, but their real-world impact is likely low. Both CVEs rely on config patterns that almost never appear in production: CVE-2026-42945 requires a flagless rewrite with ?

followed by set

or if

referencing positional captures; CVE-2026-9256 requires nested capture groups where the replacement references multiple overlapping groups. Out of 35,633 configs, we found one vulnerable config, in an abandoned project.

The caveat is that GitHub skews toward examples, tutorials, and small projects. Complex rewrite chains for language routing or URL migration tend to live in private infrastructure repos and configuration management systems that never touch public GitHub. The point/cassea

pattern, language prefix stripping via a flagless ?

rewrite, is a reasonable multilingual design that any organization could independently arrive at.

That said, these are still unauthenticated heap overflows. One vulnerable config in production is enough to cause denial of service or worse.

Try it #

ngxray is open source. Point it at your configs:

git clone https://github.com/califio/ngxray && cd ngxray
git submodule update --init && make
python3 scan.py /etc/nginx/

If you're running nginx < 1.31.1, check your rewrite directives. Look for flagless rewrites with ?

in the replacement followed by set

or if

using $1

-$9

. Look for rewrite regexes with nested capture groups whose $N

references overlap.

Or just run the scanner.

── more in #ai-research 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/needle-in-a-haystack…] indexed:0 read:12min 2026-05-29 ·