# Hardening API Scan Boundaries in skill-scanner, with sqry as the Review Map

> Source: <https://dev.to/wernerk_au/hardening-api-scan-boundaries-in-skill-scanner-with-sqry-as-the-review-map-dip>
> Published: 2026-06-14 12:24:42+00:00

On 14 June 2026 I cloned [cisco-ai-defense/skill-scanner](https://github.com/cisco-ai-defense/skill-scanner), set up the locked `uv`

environment, and worked through one small but important question: what does it take to make the REST API safer when the API can scan local directories, accept uploaded ZIP files, run optional analyzers, and queue batch work in the background?

I am not pretending this is a universal API security methodology, or that one branch makes a whole product "secure" in the abstract. This is a narrower story, and I think the narrowness is the useful part: a concrete pass over one public Python repository, with a hardening branch called `codex/harden-api-scan-boundaries`

, ending in commit `2cfa313`

and draft [PR #119](https://github.com/cisco-ai-defense/skill-scanner/pull/119), where the evidence was code, tests, docs, and a graph of the repository rather than a confident read of the obvious files.

The branch changed 24 files, with `1186 insertions`

and `210 deletions`

. The main implementation files were `skill_scanner/api/router.py`

, `skill_scanner/core/analyzer_factory.py`

, `skill_scanner/core/extractors/content_extractor.py`

, `skill_scanner/core/loader.py`

, and `skill_scanner/core/scanner.py`

, plus two new shared modules: `skill_scanner/core/archive_limits.py`

and `skill_scanner/core/fs_limits.py`

.

`skill-scanner`

scans Agent Skill packages. It has CLI paths, Python library paths, eval paths, pre-commit hook paths, and a FastAPI router that exposes endpoints for direct skill scans, uploaded ZIP scans, batch scans, batch-result polling, health checks, and analyzer listing.

That matters because the REST API does not sit in front of a simple database lookup. It sits in front of local filesystem access, archive extraction, analyzer construction, optional remote-service analyzers such as VirusTotal and Cisco AI Defense, LLM-backed analysis, scanner traversal, loader discovery, and report generation. A bug in one visible route handler can be obvious. A missing bound in a shared loader, reached through API, CLI, evals, tests, and scanner methods, is much easier to miss.

The first setup step was boring and necessary:

```
uv sync --frozen --all-extras --dev
```

That gave the API dependencies, analyzer extras, pytest, lint tooling, and the project commands needed to move from reading code to running it. The repository also had clear contribution constraints in `CONTRIBUTING.md`

: include tests for changed behaviour, update docs where behaviour or configuration changes, use a conventional commit, keep the `uv.lock`

model intact, and verify with the repository's normal commands.

The hardening target became four broad risk classes:

There is also the basic API boundary: scan work and scan-result retrieval now require `X-API-Key`

, and the expensive endpoints have process-local rate limiting. Root, health, and analyzer-listing endpoints remain informational.

The tool that changed the review was [sqry](https://github.com/verivus-oss/sqry), version `20.0.5`

. sqry uses "semantic" in the compiler sense, it parses code into ASTs, builds a graph of symbols and relationships, and answers structural questions from that graph. It is not an embedding search tool, and it is not just grep with better ranking.

The local index for this repository had `20,445`

symbols across `202`

files, with relation support enabled. The graph manifest recorded `26,120`

edges across `200`

Python files, one Ruby file, and one shell file. That is the practical reason it helped here: the API hardening problem crossed API request models, FastAPI handlers, shared scan implementation, analyzer construction, scanner traversal, loader discovery, archive extraction, documentation, and tests.

The first useful query was not clever:

```
sqry query 'path:skill_scanner/api/router.py AND kind:function'
```

It returned `98`

function symbols from `skill_scanner/api/router.py`

in about `35 ms`

on this checkout. More importantly, it produced a checklist that included `scan_skill`

, `_scan_skill_impl`

, `scan_uploaded_skill`

, `scan_batch`

, `get_batch_scan_result`

, `run_batch_scan`

, `_validate_path`

, `_count_batch_candidates`

, and `_build_analyzers`

.

That sounds mundane until you compare it with a manual route read. A manual read tends to start from decorators and then follow the code that looks important. sqry gave me the public route handlers and the helpers in one structural inventory, before I had decided which parts mattered.

The scanner side was the same:

```
sqry query 'path:skill_scanner/core/scanner.py AND kind:function'
```

That returned `76`

function symbols in about `31 ms`

, including `SkillScanner.scan_skill`

, `SkillScanner.scan_directory`

, and `_find_skill_directories`

. The useful distinction was between single-skill scanning, directory discovery, and module-level convenience functions. For a hardening pass, that distinction is load-bearing.

Then the review shifted from "where is this string?" to "what code can reach this behaviour?"

```
sqry graph direct-callers _validate_path --json
```

sqry reported four direct callers: `_resolve_policy`

, `_scan_skill_impl`

, `scan_batch`

, and `run_batch_scan`

. That made the path gate concrete. It was not enough to harden the direct `/scan`

path. The same gate needed to cover policy paths, direct skill paths, batch roots before queuing, and batch execution inside the background task.

The loader trace was the bigger warning:

```
sqry graph direct-callers 'SkillLoader.load_skill' --json
```

That returned `92`

direct callers across evals, API code, CLI code, scanner code, and tests. This is where plain text search is weak. You can find `load_skill`

text matches, but you still have to reason manually about which are method calls, convenience wrappers, test helpers, and shared execution paths. sqry made the broad shared surface visible, which is why the fix did not stop at the API router. The loader itself needed a bounded contract.

The same pattern showed up in analyzer construction. `build_analyzers`

had `11`

direct callers across API, CLI, hooks, evals, and tests. That meant `llm_consensus_runs`

needed two checks: request-model validation at the API edge, and a second cap inside the analyzer factory so non-API callers get the same invariant.

For `LLMAnalyzer._consensus_analyze`

, sqry reported one direct caller, `LLMAnalyzer.analyze_async`

, which kept the execution-side analysis focussed. The cap belongs before construction reaches the analyzer loop.

Plain `rg`

still had a place for exact strings, route decorators, docs, and final sanity checks. The difference is that sqry gave the graph-backed layer: functions and methods instead of arbitrary text, same-name symbols separated across API, CLI, hooks, evals and tests, and caller/callee traces for security-sensitive helpers.

The API path boundary now fails closed. `_validate_path`

rejects null bytes, resolves the supplied path, and denies access unless `SKILL_SCANNER_ALLOWED_ROOTS`

is configured and the resolved path is inside one of those roots. If no roots are configured, API filesystem access is denied.

That is a deliberate posture. An API that scans local paths should not assume that "current working directory" is a sensible trust boundary, and it should not silently accept arbitrary absolute paths because the caller knows them.

The upload path changed in a similarly blunt way. `/scan-upload`

still checks the client-provided filename to require a `.zip`

upload, but the server no longer uses that filename for the staging path. Uploaded bytes are written to:

```
zip_path = temp_dir / "upload.zip"
```

That small line removes an entire class of filename-controlled staging behaviour. Around it, the upload flow now streams in `1 MB`

chunks, enforces a `50 MB`

upload limit, reads ZIP EOCD metadata before constructing `ZipFile`

, rejects ZIPs over `500`

entries, rejects uncompressed ZIP contents over `200 MB`

, rejects path traversal entries by resolving each destination under the extraction root, rejects symlink entries, checks again after extraction that no symlink appeared on disk, and only then searches the extracted tree for `SKILL.md`

using a bounded walk.

The EOCD preflight lives in `skill_scanner/core/archive_limits.py`

as `read_zip_member_count`

. It reads the ZIP end-of-central-directory metadata, including the ZIP64 case, before the code has to build a `ZipFile`

object and iterate the archive. The same helper is used by the API upload handler and by `ContentExtractor`

, so archive member-count limits are not two unrelated implementations that can drift.

The traversal helpers live in `skill_scanner/core/fs_limits.py`

:

```
iter_directory_bounded
walk_directory_bounded
```

Both are based on `os.scandir`

, and both count entries as they are yielded rather than first materialising a whole tree. They are now used by API batch preflight, scanner directory discovery, loader file discovery, lenient markdown synthesis, and uploaded-tree search. That is the kind of change that looks less exciting than a route patch, but it is exactly where the graph evidence mattered. If the loader has 92 direct callers, the loader cannot depend on the API being the only adult in the room.

Batch scanning now validates the batch root, counts candidates before queueing background work, rejects requests over the configured candidate limit, and passes bounds into `SkillScanner.scan_directory`

:

```
max_candidates=MAX_BATCH_SKILLS
max_entries_visited=MAX_BATCH_PATHS_VISITED
```

The default values in the API are `100`

candidate skills and `10,000`

filesystem entries. The scanner then passes loader bounds into `SkillLoader.load_skill`

, which means the per-skill load step is part of the same bounded execution path rather than an unbounded second phase.

The analyzer boundary changed too. `llm_consensus_runs`

is capped in the API request models with Pydantic, and again in `build_analyzers`

. The API no longer exposes a remote-callable Cisco AI Defense URL override; the analyzer factory can still use operator-controlled arguments and environment configuration, including `AI_DEFENSE_API_URL`

, but the public request model does not let a caller pick the remote endpoint for the server.

Finally, scan endpoints now require `X-API-Key`

backed by `SKILL_SCANNER_API_KEY`

. `/scan`

, `/scan-upload`

, `/scan-batch`

, and `/scan-batch/{scan_id}`

all check it. The result cache for batch scans is also bounded: `1,000`

entries, with a `3600`

second TTL. The rate limiter is deliberately process-local, configurable through `SKILL_SCANNER_API_RATE_LIMIT_REQUESTS`

and `SKILL_SCANNER_API_RATE_LIMIT_WINDOW_SECONDS`

; that is useful for this server, but it is not a distributed quota system, and the docs should make that kind of caveat visible.

The branch did not stop at implementation. Tests were added or updated across:

`tests/test_api_endpoints.py`

`tests/test_api_deep.py`

`tests/test_analyzer_factory.py`

`tests/test_loader.py`

`tests/test_scanner.py`

`tests/test_extractors.py`

`tests/test_cli_tui_api_fixes.py`

The focussed verification command was:

```
uv run pytest \
  tests/test_api_endpoints.py \
  tests/test_api_deep.py \
  tests/test_analyzer_factory.py \
  tests/test_loader.py \
  tests/test_scanner.py \
  tests/test_extractors.py \
  tests/test_cli_tui_api_fixes.py \
  -q
```

On the current checkout, that collected `216`

tests and returned `215 passed, 1 skipped`

on Python `3.13.13`

, with only third-party deprecation warnings. The process report also records a broader non-integration, non-LLM, non-e2e run at `1308 passed, 5 skipped, 7 deselected`

, plus `ruff check .`

and `git diff --check`

during the contribution.

The documentation updates matter because this is not only a code contract. `.env.example`

, API docs, operations docs, endpoint detail pages, and generated reference docs now describe `SKILL_SCANNER_API_KEY`

, `SKILL_SCANNER_ALLOWED_ROOTS`

, rate limits, traversal limits, archive limits, batch limits, and the LLM consensus cap. A security control that exists only in code is easier to bypass operationally than one that is named in the configuration surface people actually read.

The useful lesson here is not "AI found security bugs". That is too vague, and frankly not the interesting part.

The useful lesson is that AI-assisted review gets much better when the agent is forced to work from repository facts that can be rerun: symbol inventories, caller traces, callee traces, exact changed files, test names, and concrete verification commands. A model can read the most obvious route handler and sound convincing. A graph can show that the helper under discussion has four direct callers, or that a loader method has 92 direct callers, and that changes the review from opinion to coverage.

That is where sqry was valuable. It made the review faster, but the speed was not the main win. The main win was not having to trust a first-pass mental map of the codebase. The map was queryable, and when the map said the loader was shared across API, CLI, eval, scanner, and tests, the fix moved down into the loader. When the map said analyzer construction was shared, the consensus cap moved into the factory as well as the API request model.

This is also why I do not like abstract claims about "secure by design" unless the design names the boundary and the evidence. In this branch, the claims are more modest and more useful: API path access fails closed without configured roots; uploaded filenames no longer control staging paths; archive expansion has member, size, traversal, and symlink checks; batch discovery and scanner traversal have explicit limits; loader discovery has explicit limits; LLM consensus runs are capped at both the request and factory boundary; the focussed suite passes.

Those are claims a maintainer can inspect.

The same pattern showed up while working through issues in [NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector/issues): Stage 2 LLM batch failures, retry and concurrency behaviour, unanalyzed findings, ingest-layer bounds, and whitespace-padding detection all ended up being boundary questions. Different repository, different implementation, same shape of problem.

This is the part that feels important to me. AI-assisted development can help us ship faster, but faster shipping also means we can expose larger attack surfaces sooner: more API entry points, more archive and clone paths, more model calls, more background work, more places where a scanner accepts untrusted input. The answer is not to slow everything down by default; it is to make boundary review part of the shipping motion, with concrete limits, tests, and code-graph evidence before the surface gets too wide to reason about.

`SKILL_SCANNER_ALLOWED_ROOTS`

being absent means no API path access, not "scan whatever path was supplied".Thanks for reading this far, I hope this is useful if you are hardening an API that wraps local filesystem work, archive extraction, or other expensive scanner-style behaviour. The bit I would reuse first is not any single line of code, it is the habit of asking the repository graph where the boundary actually runs before deciding where the fix belongs.