Hardening API Scan Boundaries in skill-scanner, with sqry as the Review Map

A developer hardened the REST API boundaries in Cisco's skill-scanner repository, a tool for scanning Agent Skill packages, by adding authentication, rate limiting, and input validation. The hardening branch, codex/harden-api-scan-boundaries, modified 24 files with 1186 insertions and 210 deletions, focusing on shared modules like archive_limits.py and fs_limits.py to prevent bugs in API, CLI, and other paths. The developer used sqry, a semantic code query tool, to analyze 20,445 symbols across 202 files and identify cross-cutting security concerns.

On 14 June 2026 I cloned cisco-ai-defense/skill-scanner https://github.com/cisco-ai-defense/skill-scanner , set up the locked uv environment, and worked through one small but important question: what does it take to make the REST API safer when the API can scan local directories, accept uploaded ZIP files, run optional analyzers, and queue batch work in the background? I am not pretending this is a universal API security methodology, or that one branch makes a whole product "secure" in the abstract. This is a narrower story, and I think the narrowness is the useful part: a concrete pass over one public Python repository, with a hardening branch called codex/harden-api-scan-boundaries , ending in commit 2cfa313 and draft PR 119 https://github.com/cisco-ai-defense/skill-scanner/pull/119 , where the evidence was code, tests, docs, and a graph of the repository rather than a confident read of the obvious files. The branch changed 24 files, with 1186 insertions and 210 deletions . The main implementation files were skill scanner/api/router.py , skill scanner/core/analyzer factory.py , skill scanner/core/extractors/content extractor.py , skill scanner/core/loader.py , and skill scanner/core/scanner.py , plus two new shared modules: skill scanner/core/archive limits.py and skill scanner/core/fs limits.py . skill-scanner scans Agent Skill packages. It has CLI paths, Python library paths, eval paths, pre-commit hook paths, and a FastAPI router that exposes endpoints for direct skill scans, uploaded ZIP scans, batch scans, batch-result polling, health checks, and analyzer listing. That matters because the REST API does not sit in front of a simple database lookup. It sits in front of local filesystem access, archive extraction, analyzer construction, optional remote-service analyzers such as VirusTotal and Cisco AI Defense, LLM-backed analysis, scanner traversal, loader discovery, and report generation. A bug in one visible route handler can be obvious. A missing bound in a shared loader, reached through API, CLI, evals, tests, and scanner methods, is much easier to miss. The first setup step was boring and necessary: uv sync --frozen --all-extras --dev That gave the API dependencies, analyzer extras, pytest, lint tooling, and the project commands needed to move from reading code to running it. The repository also had clear contribution constraints in CONTRIBUTING.md : include tests for changed behaviour, update docs where behaviour or configuration changes, use a conventional commit, keep the uv.lock model intact, and verify with the repository's normal commands. The hardening target became four broad risk classes: There is also the basic API boundary: scan work and scan-result retrieval now require X-API-Key , and the expensive endpoints have process-local rate limiting. Root, health, and analyzer-listing endpoints remain informational. The tool that changed the review was sqry https://github.com/verivus-oss/sqry , version 20.0.5 . sqry uses "semantic" in the compiler sense, it parses code into ASTs, builds a graph of symbols and relationships, and answers structural questions from that graph. It is not an embedding search tool, and it is not just grep with better ranking. The local index for this repository had 20,445 symbols across 202 files, with relation support enabled. The graph manifest recorded 26,120 edges across 200 Python files, one Ruby file, and one shell file. That is the practical reason it helped here: the API hardening problem crossed API request models, FastAPI handlers, shared scan implementation, analyzer construction, scanner traversal, loader discovery, archive extraction, documentation, and tests. The first useful query was not clever: sqry query 'path:skill scanner/api/router.py AND kind:function' It returned 98 function symbols from skill scanner/api/router.py in about 35 ms on this checkout. More importantly, it produced a checklist that included scan skill , scan skill impl , scan uploaded skill , scan batch , get batch scan result , run batch scan , validate path , count batch candidates , and build analyzers . That sounds mundane until you compare it with a manual route read. A manual read tends to start from decorators and then follow the code that looks important. sqry gave me the public route handlers and the helpers in one structural inventory, before I had decided which parts mattered. The scanner side was the same: sqry query 'path:skill scanner/core/scanner.py AND kind:function' That returned 76 function symbols in about 31 ms , including SkillScanner.scan skill , SkillScanner.scan directory , and find skill directories . The useful distinction was between single-skill scanning, directory discovery, and module-level convenience functions. For a hardening pass, that distinction is load-bearing. Then the review shifted from "where is this string?" to "what code can reach this behaviour?" sqry graph direct-callers validate path --json sqry reported four direct callers: resolve policy , scan skill impl , scan batch , and run batch scan . That made the path gate concrete. It was not enough to harden the direct /scan path. The same gate needed to cover policy paths, direct skill paths, batch roots before queuing, and batch execution inside the background task. The loader trace was the bigger warning: sqry graph direct-callers 'SkillLoader.load skill' --json That returned 92 direct callers across evals, API code, CLI code, scanner code, and tests. This is where plain text search is weak. You can find load skill text matches, but you still have to reason manually about which are method calls, convenience wrappers, test helpers, and shared execution paths. sqry made the broad shared surface visible, which is why the fix did not stop at the API router. The loader itself needed a bounded contract. The same pattern showed up in analyzer construction. build analyzers had 11 direct callers across API, CLI, hooks, evals, and tests. That meant llm consensus runs needed two checks: request-model validation at the API edge, and a second cap inside the analyzer factory so non-API callers get the same invariant. For LLMAnalyzer. consensus analyze , sqry reported one direct caller, LLMAnalyzer.analyze async , which kept the execution-side analysis focussed. The cap belongs before construction reaches the analyzer loop. Plain rg still had a place for exact strings, route decorators, docs, and final sanity checks. The difference is that sqry gave the graph-backed layer: functions and methods instead of arbitrary text, same-name symbols separated across API, CLI, hooks, evals and tests, and caller/callee traces for security-sensitive helpers. The API path boundary now fails closed. validate path rejects null bytes, resolves the supplied path, and denies access unless SKILL SCANNER ALLOWED ROOTS is configured and the resolved path is inside one of those roots. If no roots are configured, API filesystem access is denied. That is a deliberate posture. An API that scans local paths should not assume that "current working directory" is a sensible trust boundary, and it should not silently accept arbitrary absolute paths because the caller knows them. The upload path changed in a similarly blunt way. /scan-upload still checks the client-provided filename to require a .zip upload, but the server no longer uses that filename for the staging path. Uploaded bytes are written to: zip path = temp dir / "upload.zip" That small line removes an entire class of filename-controlled staging behaviour. Around it, the upload flow now streams in 1 MB chunks, enforces a 50 MB upload limit, reads ZIP EOCD metadata before constructing ZipFile , rejects ZIPs over 500 entries, rejects uncompressed ZIP contents over 200 MB , rejects path traversal entries by resolving each destination under the extraction root, rejects symlink entries, checks again after extraction that no symlink appeared on disk, and only then searches the extracted tree for SKILL.md using a bounded walk. The EOCD preflight lives in skill scanner/core/archive limits.py as read zip member count . It reads the ZIP end-of-central-directory metadata, including the ZIP64 case, before the code has to build a ZipFile object and iterate the archive. The same helper is used by the API upload handler and by ContentExtractor , so archive member-count limits are not two unrelated implementations that can drift. The traversal helpers live in skill scanner/core/fs limits.py : iter directory bounded walk directory bounded Both are based on os.scandir , and both count entries as they are yielded rather than first materialising a whole tree. They are now used by API batch preflight, scanner directory discovery, loader file discovery, lenient markdown synthesis, and uploaded-tree search. That is the kind of change that looks less exciting than a route patch, but it is exactly where the graph evidence mattered. If the loader has 92 direct callers, the loader cannot depend on the API being the only adult in the room. Batch scanning now validates the batch root, counts candidates before queueing background work, rejects requests over the configured candidate limit, and passes bounds into SkillScanner.scan directory : max candidates=MAX BATCH SKILLS max entries visited=MAX BATCH PATHS VISITED The default values in the API are 100 candidate skills and 10,000 filesystem entries. The scanner then passes loader bounds into SkillLoader.load skill , which means the per-skill load step is part of the same bounded execution path rather than an unbounded second phase. The analyzer boundary changed too. llm consensus runs is capped in the API request models with Pydantic, and again in build analyzers . The API no longer exposes a remote-callable Cisco AI Defense URL override; the analyzer factory can still use operator-controlled arguments and environment configuration, including AI DEFENSE API URL , but the public request model does not let a caller pick the remote endpoint for the server. Finally, scan endpoints now require X-API-Key backed by SKILL SCANNER API KEY . /scan , /scan-upload , /scan-batch , and /scan-batch/{scan id} all check it. The result cache for batch scans is also bounded: 1,000 entries, with a 3600 second TTL. The rate limiter is deliberately process-local, configurable through SKILL SCANNER API RATE LIMIT REQUESTS and SKILL SCANNER API RATE LIMIT WINDOW SECONDS ; that is useful for this server, but it is not a distributed quota system, and the docs should make that kind of caveat visible. The branch did not stop at implementation. Tests were added or updated across: tests/test api endpoints.py tests/test api deep.py tests/test analyzer factory.py tests/test loader.py tests/test scanner.py tests/test extractors.py tests/test cli tui api fixes.py The focussed verification command was: uv run pytest \ tests/test api endpoints.py \ tests/test api deep.py \ tests/test analyzer factory.py \ tests/test loader.py \ tests/test scanner.py \ tests/test extractors.py \ tests/test cli tui api fixes.py \ -q On the current checkout, that collected 216 tests and returned 215 passed, 1 skipped on Python 3.13.13 , with only third-party deprecation warnings. The process report also records a broader non-integration, non-LLM, non-e2e run at 1308 passed, 5 skipped, 7 deselected , plus ruff check . and git diff --check during the contribution. The documentation updates matter because this is not only a code contract. .env.example , API docs, operations docs, endpoint detail pages, and generated reference docs now describe SKILL SCANNER API KEY , SKILL SCANNER ALLOWED ROOTS , rate limits, traversal limits, archive limits, batch limits, and the LLM consensus cap. A security control that exists only in code is easier to bypass operationally than one that is named in the configuration surface people actually read. The useful lesson here is not "AI found security bugs". That is too vague, and frankly not the interesting part. The useful lesson is that AI-assisted review gets much better when the agent is forced to work from repository facts that can be rerun: symbol inventories, caller traces, callee traces, exact changed files, test names, and concrete verification commands. A model can read the most obvious route handler and sound convincing. A graph can show that the helper under discussion has four direct callers, or that a loader method has 92 direct callers, and that changes the review from opinion to coverage. That is where sqry was valuable. It made the review faster, but the speed was not the main win. The main win was not having to trust a first-pass mental map of the codebase. The map was queryable, and when the map said the loader was shared across API, CLI, eval, scanner, and tests, the fix moved down into the loader. When the map said analyzer construction was shared, the consensus cap moved into the factory as well as the API request model. This is also why I do not like abstract claims about "secure by design" unless the design names the boundary and the evidence. In this branch, the claims are more modest and more useful: API path access fails closed without configured roots; uploaded filenames no longer control staging paths; archive expansion has member, size, traversal, and symlink checks; batch discovery and scanner traversal have explicit limits; loader discovery has explicit limits; LLM consensus runs are capped at both the request and factory boundary; the focussed suite passes. Those are claims a maintainer can inspect. The same pattern showed up while working through issues in NVIDIA SkillSpector https://github.com/NVIDIA/SkillSpector/issues : Stage 2 LLM batch failures, retry and concurrency behaviour, unanalyzed findings, ingest-layer bounds, and whitespace-padding detection all ended up being boundary questions. Different repository, different implementation, same shape of problem. This is the part that feels important to me. AI-assisted development can help us ship faster, but faster shipping also means we can expose larger attack surfaces sooner: more API entry points, more archive and clone paths, more model calls, more background work, more places where a scanner accepts untrusted input. The answer is not to slow everything down by default; it is to make boundary review part of the shipping motion, with concrete limits, tests, and code-graph evidence before the surface gets too wide to reason about. SKILL SCANNER ALLOWED ROOTS being absent means no API path access, not "scan whatever path was supplied".Thanks for reading this far, I hope this is useful if you are hardening an API that wraps local filesystem work, archive extraction, or other expensive scanner-style behaviour. The bit I would reuse first is not any single line of code, it is the habit of asking the repository graph where the boundary actually runs before deciding where the fix belongs.