AI-Assisted AuthZ Review: Reading Permission Boundaries in Ory Kratos

wpnews.pro

Second in a series on using AI to review authorization — not to spray reports.

Companion reference: AuthZ Smell Catalog.

The cheapest thing an AI can do in security is generate suspicion. Point a model at a

codebase and it will hand you fifty "possible IDORs" before you finish your coffee. Almost

all of them are wrong — guarded three lines up, scoped at the data layer, or protected at a

boundary the model never saw.

That flood is exactly why several bug bounty programs spent 2026 tightening or pausing:

they were drowning in confident, plausible, wrong reports.

So this review inverts the usual loop. The AI's job is not to find bugs — it is to

over-generate hypotheses cheaply. My job is to kill them. What survives that killing

is the only thing worth a human's time, and the record of what died is more useful than the

record of what lived.

The artifact of an honest review is therefore not a finding. It's a kill table.

repo_only

, and I say so explicitly rather than implying it reaches a live product.

What this review does and does not claim.In this limited, repo-only review, the

hypotheses I tested were killed. This isnota claim that Kratos has no vulnerabilities,

and it is not a security audit. It is a case study in how AI-assisted AuthZ review can

avoid false positives — how tokilla suspicion instead of shipping it.

I let cheap finders over-generate against the AuthZ Smell Catalog. The raw candidate

list, unfiltered:

/sessions/whoami

, identity lookups, or an admin identity fetch? Five confident hypotheses. This is the part AI is good at. Now the part it can't do.

The rule: assume each is by-design until a concrete test says otherwise, and default to killing it. For source-only review, the "test" is: can I trace a

H1 — Admin API "missing" authorization → KILLED (by design).

Kratos deliberately ships the admin API with no built-in authorization. Ory's own

documentation states the admin API must be protected at the network boundary (ingress, a

reverse proxy, Oathkeeper) and never exposed publicly. So "no authz check in the handler" is

not a missing guard — it is the guard living one layer out, exactly the false-positive shape

in Catalog §13 (middleware/deployment-layer authorization). A report of "admin API allows

identity CRUD without auth" is by-design and would be closed as such. Killed.

H2 — Cross-identity / cross-tenant read → KILLED (chokepoint design).

This is the interesting one. Kratos does not scatter tenant checks across handlers. Its

persistence layer runs every query through a network Contextualizer that injects the

network id (nid

) into the SQL — the data-access layer itself filters by tenant, centrally.

A handler cannot accidentally read across the boundary, because the boundary is enforced

below the handler, at the one place every read funnels through. On the public API, identity

access is derived from the session's identity, never from a client-supplied id. To break

H2 you would have to find a read path that bypasses the persister entirely — and I found no

user-reachable one in this build. Killed. And worth noting as a pattern: concentrating the

tenant filter at the data-access layer collapses the whole class into a single auditable

point — which is why these particular hypotheses died here (Catalog §B).

H3 — Token reuse → KILLED.

Recovery and verification tokens are single-use and time-boxed; redemption invalidates the

token in the same transaction. Replay after use fails. Killed.

H4 — Settings-flow identity confusion → KILLED.

The settings flow binds to the identity resolved from the authenticated session. The identity

being modified is not taken from client input, so you cannot retarget the flow at someone

else's traits. Killed (Catalog §02 — read-reachability is not write-reachability, and here

even read is session-bound).

H5 — Tenant from payload → KILLED.

The network id is derived from context, not from the request body. An admin create/update

cannot smuggle a foreign nid

. Killed.

The deliverable of the whole review, on one screen:

#	Hypothesis	Catalog	Verdict
H1	Admin API missing authz	§01, §13	by-design
authz is at the network boundary, not the handler — documented
H2	Cross-identity / cross-tenant read	§04, §05	defended
`nid` enforced at the persister via the Contextualizer; public reads are session-bound
H3	Recovery/verification token reuse	§09	defended
single-use, time-boxed, invalidated on redemption
H4	Settings-flow identity confusion	§02, §07	defended
flow bound to the session identity, not client input
H5	Tenant assignment from payload	§04	defended
`nid` from context, not request body

Five hypotheses in. Zero findings out. This is a successful review, not a failed one —

and to be exact, it is a successful review of five hypotheses, not a clean bill of health

for Kratos.

Two shapes here generalize far beyond Kratos:

None of the hypotheses I tested survived source-only review — and the reason is worth

publishing: Kratos concentrates its tenant boundary in one place (the persister's

Contextualizer) and derives identity from the session rather than from client input. That

design choice is precisely what made four of my five hypotheses collapse to one question, and

that question had a clean answer.

If I were to keep going, the only honest next move would be to enumerate every ingress that

could reach persisted data without the persister — background jobs, imports, any raw query.

In the OSS build there is no user-reachable one. That negative result is real signal, and it

is tier repo_only

: I am not claiming it holds against any specific hosted deployment.

repo_only

is not hosted_confirmed

. Say which one you have. Conflating them is how OSS reading turns into a false bounty claim.Each kill sharpened a catalog entry's confirm/kill column — the column that separates a real

bug from a by-design behavior:

The catalog is not a static list; every real outcome — even a clean by-design result — feeds a

sharper kill test back into it. That feedback loop is the asset, not the entry count.

This review produces exactly one row for the outcome ledger — the honest kind, a defended

target:

date=2026-07-04, program=Ory Kratos (self-directed OSS review), source_type=oss_source_available,
class=tenant_boundary, repro_tier=repo_only, human_verdict=by_design, final_status=not_applicable,
payout_usd=0, lesson="Contextualizer/nid chokepoint concentrates the tenant boundary; admin-API
authz is deployment-layer by design — both are KILLs, not bugs. Review collapses to: can anything
reach persisted data without the persister?"

Row #1 in a ledger is not supposed to be a payout. It's supposed to be true. From here, the

next step is a single source-available target with a newly-added permission boundary

(a fresh RBAC, workspace, billing, or SSO/SCIM feature) — the un-picked-over surface — run

through the same over-generate-then-kill loop, and logged as ledger row #2. One target. Not ten.

I use AI to reject candidates and humans to verify the few that survive. If that approach is useful to you, the AuthZ Smell Catalog is the companion reference this series builds on.

source & further reading

dev.to — original article Mastering Local Deployment of SOTA LLMs: Jamesob’s Guide to Overcoming Resource Constraints Why 'Just Be Careful Next Time' Never Reaches an AI The Tool We Built to Measure AI Visibility Couldn't Find Itself

AI-Assisted AuthZ Review: Reading Permission Boundaries in Ory Kratos

Run your AI side-project on zahid.host