ARGUS—Automated Reconnaissance & General-purpose Universal Surveyor.A first-class, target-agnostic toolkit for mapping any web application’s API surface, auth, feature flags, real-time protocols, and AI-agent internals — and feeding what it learns straight back into CosySim’s live systems.
CosySim is local-first, but it doesn’t live in a vacuum. It talks to a lot of undocumented web APIs — Google’s batchexecute
endpoints behind NotebookLM, Gemini, and AI Studio; startup WebSocket protocols; AI-agent platforms. ARGUS is the muscle that reverse-engineers those surfaces. It lives in scripts/argus/ and is
The operating philosophy, straight from scripts/argus/README.md:
Knowledge is the prize. We don’t exploit — we learn.Capture everything, decode offline, never modify live state until the surface is fully mapped.
ARGUS is layered: a generic core toolkit, a CLI, an MCP server so local agents can drive it, and a set of specialized analyzers/decoders/discovery modules.
Layer |
Module |
Role |
|---|---|---|
Core toolkit |
scripts/argus/toolkit.py |
scripts/argus/analyze.py
har
/ heap
/ compare
/ heap-diff
/ dir
/ deep
subcommandsanalyzers/har_analyzer.py
analyzers/heap_analyzer.py
decoders/
batchexecute
, grpc_web
, heap_diffing
discovery/
rpcid_detector
, feature_flag_probe
, proto_reconstructor
, endpoint_registry
cdp_bridge.py
, network_monitor.py
argus_mcp_server.py
nexus_sink.py
, rpcid_mapper.py
One captured browsing session yields every request/response pair, headers, cookies, timing, and bodies. HARAnalyzer
auto-detects the protocol (REST, GraphQL, gRPC-web, batchexecute
, WebSocket upgrades) and groups endpoints by service, decoding JWTs and pattern-matching API keys along the way.
python -m scripts.argus.analyze har capture.har --report # → Markdown intel report
python -m scripts.argus.analyze compare loggedout.har admin.har # diff roles to find gated endpoints
V8 heap snapshots contain every string the JS runtime has interned — compiled-in config, unused API routes, internal gRPC service names, RPC IDs, and secrets that never transit the wire. Two engines run over them: a regex scanner with 100+ patterns (mine_heap
) and a full V8 graph walker (mine_heap_deep
) that reconstructs objects and script sources.
python -m scripts.argus.analyze heap snapshot.heapsnapshot
python -m scripts.argus.analyze heap-diff before.heap after.heap # isolate strings a single action introduced
The classifier buckets strings into URLs, API endpoints, method names, service paths, RPC IDs, and credential-shaped tokens — covering JWTs, K8s *.svc.cluster.local
addresses, STUN/TURN servers, Statsig caches, protobuf definitions, and leaked model reasoning.
A minified SPA bundle holds the entire app logic. decompile_bundle()
extracts feature-gate enums, API route strings, environment variables (VITE_*
/ NEXT_PUBLIC_*
/ REACT_APP_*
), CI/CD paths, and monitoring DSNs. In one documented run, the bundle revealed 17× more URL paths than live traffic — most endpoints gate features the current user can’t reach.
cdp_bridge.py
is a full async Chrome DevTools Protocol client (Chrome on --remote-debugging-port=9223
). It enables programmatic JS execution, localStorage
feature-flag injection, network capture, and WebSocket frame interception — the only way to map real-time protocols, since HAR captures only the HTTP upgrade, not the frames.
from scripts.argus.toolkit import cdp_eval, inject_statsig_gates
cdp_eval("document.title", cdp_port=9223)
inject_statsig_gates("https://app.example.com", {"some_gate": True})
ARGUS is explicit about the distinction that matters most:
client-only vs server-enforced.Flipping a Statsig gate inlocalStorage
reveals UI, but if the endpoint checks the flag server-side, every call still 403s. Every finding is tagged accordingly — see the security-assessment checklist in the methodology guide.
This is what makes ARGUS part of CosySim rather than a bolt-on scanner. Discoveries don’t sit in a report — they flow back into the running framework:
HAR / heap / bundle
│ decode offline
▼
rpcid_detector ── compares live traffic against the known baseline
│ new rpcid?
├──────────────► ArgusNexusSink → Nexus KMS (category="argus")
│ store_new_rpcid() + add_qa() → agents query via nexus_search
│
└──────────────► RpcidUpdater (engine/integrations/rpcid_updater.py)
writes config/nlm_rpcids.yaml + data/nlm_rpc_registry.json
→ live NLM/Gemini ops pick up new rpcids at call time
When Google rotates an NLM/Gemini frontend build and rpcids change, a fresh capture run through ARGUS re-discovers them, RpcidUpdater
patches both the YAML source-of-truth and the JSON runtime cache, and get_rpcid()
resolves the new value on the next call — no code change, no redeploy. Meanwhile ArgusNexusSink
files every new rpcid, endpoint, and feature flag into Nexus KMS as both a knowledge entry and a Q&A pair, so any agent can ask “what is rpcid X?” and get the answer ARGUS learned. Recon becomes institutional memory.
Because ARGUS ships an in-process MCP server (argus_mcp_server.py
, FastMCP/SSE on :8010
) and the CDP capabilities are also registered as MCP skills, local LMStudio agents can run reconnaissance themselves — screenshot a page and ask a vision model what it sees, navigate, click, fill, intercept. The same toolkit a human runs from the CLI is callable by an autonomous agent inside the MCP interceptor pipeline.
The methodology is distilled from 370+ exploration sessions against two real targets — a voice-AI platform and a text-AI platform with a virtual OS. Headline numbers (full reports in data/argus/reports/
):
Metric | Target A (voice) | Target B (text + virtual OS) | |---|---|---| | API methods discovered | 53 | 20+ | | Feature flags mapped | 27 gates, 14 configs | — | | JWTs decoded | 3 | 2 | | Internal IPs found | 3 | 2 (K8s) | | Sub-agents extracted | 0 | 5 | | Apps / tools mapped | 0 | 12 | | Chain-of-thought fragments | 0 | 15+ | | Protobuf schemas reconstructed | 0 | 1 | | Security findings | 14 | — |
python -m scripts.argus.analyze har path/to/file.har --report # any HAR → report
python -m scripts.argus.analyze heap path/to/file.heapsnapshot # any heap snapshot
python -m scripts.argus.analyze deep path/to/captures/ # full automated pipeline
Whenever you hand CosySim a HAR file, a heap snapshot, or a web app, ARGUS is meant to run automatically — that’s the standing convention in the project. The thirteen techniques (HAR, heap, bundle, flags, CDP, WebSocket, tokens, profile CRUD, env mapping, security assessment, agent orchestration, chain-of-thought, schema extraction) are written up as step-by-step playbooks you can borrow for any target.
CosySim’s third pillar (alongside games and services) is creation: a set of tools that turn natural-language intent into game-ready assets and even entire scenes — all running on local hardware. Three things make it distinctive:
AssetStudioCore.generate(asset_type, params)
) routes images, portraits, voice, video, items, SVG and audio through a single, flag-gated orchestrator.@skill
an agent can call, and a /api/inject_to_scene
route lets an asset flow straight from generation into a live scene’s static folder with a hot-reload socket event. Agents create content All inference is local: image/video/portrait via
ComfyUI(:8188
), voice via theTTS manager, and LLM-assisted items/SVG + the VL quality inspector viaLMStudio(:1234
). Nothing leaves the machine.
The Asset Studio scene (content/scenes/asset_studio/
) is a Flask/Socket.IO front end over engine/asset_studio/
. The architectural heart is AssetStudioCore
(engine/asset_studio/studio_core.py
), a singleton that owns the whole lifecycle:
from engine.asset_studio import get_studio_core
core = get_studio_core()
result = core.generate("portrait", {"character_id": "aria", "mood": "happy"})
generate()
does five things in order: route to the right generator (lazy-loaded from _GENERATOR_MAP
), register the result in the SQLite asset library, optionally cache metadata to Nexus KMS, emit an asset_generated
socket event for live scenes, and return a normalized dict. Every asset type is gated by config feature flags so a deployment can disable, say, video or adult content without touching code:
Asset type |
Generator |
Backend |
Required flag(s) |
|---|---|---|---|
image |
ImageGenerator |
ComfyUI | asset_studio.comfyui_enabled |
portrait |
PortraitGenerator |
ComfyUI + PortraitCache | asset_studio.comfyui_enabled |
video |
VideoGenerator |
ComfyUI (Wan 2.2) | comfyui_enabled + video_enabled |
voice |
VoiceGenerator |
TTS manager | asset_studio.tts_enabled |
item |
ItemGenerator |
LMStudio + ComfyUI icon | asset_studio.lms_enabled |
svg |
SvgGenerator |
LMStudio | asset_studio.lms_enabled |
audio |
AudioGenerator |
synthesized | — (always on) |
core.health()
rolls up live status of ComfyUI (/system_stats
), the TTS backends, LMStudio readiness, and per-type library counts — exactly the kind of monitoring hook the project’s conventions require.
AssetLibrary
(engine/asset_studio/asset_library.py
) is a thread-safe SQLite catalogue (data/asset_library.db
). Every generated asset is registered with full provenance — asset_type
, scene
, character_id
, mood
, preset_id
, the exact positive/negative prompt
, duration_ms
, a cached
flag, and JSON metadata
— and indexed by type/scene/character/recency. It supports filtered+paginated list_assets()
, full-text search over title/prompt, favorites, bulk delete, and stats()
. Because the prompt and preset are stored, any asset is reproducible.
Generators don’t take raw prompts. PromptBuilder
(prompt_builder.py
) composes them from a subject, a scene-context template (penthouse, lounge, tavern, casino, neoncity, arena, …), a mood modifier (14 moods from neutral
to seductive
), and style/negative tags from a StylePreset
. Portraits additionally pull a character’s physical description from Nexus KMS (get_nexus_client().ask(...)
) so a portrait actually looks like the character. PresetManager
ships 8 built-in presets — dark_renaissance
(the v1.58 default), cyberpunk
, fantasy
, noir
, anime
, photorealistic
, pixel_art
, minimal
— and users can store custom presets in Nexus.
WorkflowManager
(workflow_manager.py
) is the full ComfyUI client: node/model discovery via /object_info
(cached 5 min), capability checks (has_node("FaceDetailer")
), priority-based select_model()
, and the complete queue → poll /history → download outputs lifecycle — all degrading gracefully when ComfyUI is offline.
The graphs themselves are built dynamically by workflow_builder.py
, which exposes 15 professional workflows in WORKFLOW_REGISTRY
(each with label, category, resolution, speed, and requires_nodes
for capability gating):
portrait_fast
, portrait_hires
(auto-selected when FaceDetailer
UltralyticsDetectorProvider
are present), portrait_refiner
(dual-pass: base → 1.5× upscale → img2img refiner).scene_background
(widescreen cinematic), character_card
(full-body 832×1216), message_image
(8-step Lightning).UnetGGUF
- two-stage
KSamplerAdvanced
):video_wan_t2v
, video_wan_i2v
, video_wan_landscape
, video_wan_portrait_fast
, video_wan_character_hq
— e.g. 272×352 portrait, 105 frames @16fps (~6.5s).LoRA stacking is handled by composable chain helpers (_build_lora_chain
for SDXL, _build_video_lora_chain
for Wan), and portraits push their result URL into PortraitCache
so live scenes display the new art immediately.
This is the part worth borrowing. WorkflowManager.check_image_quality()
base64-encodes a generated image, sends it to a local Qwen3-VL model via LMStudio, and parses a structured verdict:
{ "score": 0-10, "issues": [], "strengths": [], "suggestion": "..." }
The TuningEngine
(tuning_engine.py
) builds on this to do automated parameter search. You give it a base param set and a sweep ({"cfg": [1.0, 1.5, 2.0], "steps": [8, 20]}
); it generates the Cartesian product of variants in a background thread, scores each with Qwen3-VL, persists every run to a metrics DB (data/asset_studio/tuning_metrics.db
), and picks the best variant by VL score (falling back to fastest on ties). It ships with 6 “proven profiles” seeded from real working ComfyUI exports (e.g. proven_portrait_fast
: lcm/exponential, cfg 1.5, 20 steps, Lightning 8-step LoRA), and get_best_settings(workflow_id)
returns the top-N tuned param sets from history. The result is a studio that learns which settings produce good images on your models — no human eyeballing a grid.
Content creation here is built for autonomous agents as a first-class user:
asset_studio_skills.py
registers generate_image
, generate_portrait
, generate_voice
, create_game_item
, generate_svg
, list_assets
, and studio_health
as @skill
-decorated functions (categories MEDIA
/GAME
/SYSTEM
, with cooldowns and costs) — so any CharacterAgent
governed by the MCP pipeline can create assets mid-conversation.[IMAGE:prompt]
stream tag means an LLM can emit an image request inline in its reply and have it rendered.POST /api/inject_to_scene
copies a generated asset into content/scenes/{scene}/static/img/
and emits scene_asset_updated
for live reload — closing the loop from If the Asset Studio makes the contents, the Creation Kit (content/scenes/creation_kit/
) makes the containers. It’s a visual, drag-and-drop scene editor backed by engine/creation/
:
component_registry.py
) — asset_hint
metadata so portrait/image components know to pull from the Asset Studio. Reuse over reinvention: components render the same markup the hand-built scenes use.data/layouts/
), with live preview and pre-shipped rebuilds of real scenes (tavern, grid, arena, lounge, casino) plus templates (chat room, dashboard, shop, dungeon, terminal, …).export_*
helpers turn a layout into a working scene, and create_scene()
(scene_template.py
) scaffolds the directory control_plane_registry.py
and config/launcher.yaml
— so an exported scene is immediately launchable via python launcher.py <name>
and even gets a generated test file.character_wizard.py
: Archetype → Appearance → Voice → Stats → Story → Memory Seed) exposed over /api/wizard/*
, producing a fully registered CharacterAgent
with personality, backstory, and seeded RAG memories — ready to drop into any scene./api/assets/combined
merges the Asset Studio library and the creation asset_registry
, so a builder picks from everything generated across the project.Every scene ships with a hand-crafted Dark Renaissance UI kit. A sample of the 12 design kits that drive the live scenes:
Plus 18 game scenes in all — Club Noir, The Colosseum, The Velvet Pit, The Rusty Anchor, The Obscura, The Shattered Throne, The Lab, The Arcade, Lab Break, Cyberspace, The Auction House, and more — each a live local-agent simulation. Run
python launcher.py --list
to see every target.
engine/ core: lmstudio · nexus · world · agents · mcp · skills · training · observability · integrations
content/ scenes/ (35 targets) · shared/ (Neon HUD v2, design system)
apps/ standalone entry points + multi-protocol proxy + unified CLI surface
scripts/ argus/ (recon toolkit) · oracle.py · smart_test.py · browser_test.py
config/ default.yaml (+ example secret templates)
docs/ deep-dive documentation — start at docs/INDEX.md
tests/ pytest suite (plain assert, mocked external services)
Area |
Doc |
|---|---|
| Index of everything |
docs/INDEX.md |
docs/ARCHITECTURE.md
docs/MCP_FRAMEWORK.md
docs/NEXUS.md
docs/OPERATIONS.md
docs/ARGUS_METHODOLOGY.md
docs/DESIGN_SYSTEM_V2.md
CHANGELOG.md
Large parts of CosySim — including this README — were produced through agentic coding: fleets of AI agents reading the codebase, designing changes, implementing them across disjoint files, and verifying their own work with tests. The project is deliberately structured to be legible to both people and agents (consistent docstrings, version-stamped change logs, an observability spine in the Oracle, and a knowledge base that compounds). If you’re exploring what agent-built software can look like, this whole repository is the example.
See LICENSE. Built to be learned from and borrowed — take what’s useful.