{"slug": "a-zero-copy-c-python-dfa-engine-that-scrubs-logs-at-571-mb-s", "title": "A zero-copy C/Python DFA engine that scrubs logs at 571 MB/s", "summary": "Guardog, a zero-copy C/Python DFA engine, scrubs logs at 571 MB/s, operating 14.4x faster than standard Python regex by bypassing the GIL and using in-place memory mutation. The engine is available for acquisition and targets hyperscale text sanitization for server logs, LLM prompts, and API ingestion.", "body_md": "**Technical Evaluation & Asset Manifest**\n\nNOTICE OF ASSET SALE> The intellectual property, complete source code, and exclusive commercial rights to the Guardog Enterprise Engine are currently available for outright acquisition. This document serves as the technical evaluation manifest for prospective buyers. SeePart 5: Acquisition & IP Transferfor terms.\n\nGuardog is a hyperscale, multi-threaded Deterministic Finite Automaton (DFA) text sanitization engine. Built as a native C-extension for Python, it is designed to intercept, scan, and scrub high-volume payloads (server logs, LLM prompt streams, API ingestion) for sensitive data in real-time.\n\nClocking at **571+ MB/s**, Guardog operates at **14.4x the speed** of standard sequential Python `re`\n\nimplementations by bypassing the Global Interpreter Lock (GIL) and utilizing a Zero-Copy memory architecture. It is designed to immediately reduce CPU compute overhead and API latency in high-throughput environments.\n\nStandard regex pipelines fail at hyperscale due to three bottlenecks: CPU thread locking, memory duplication, and backtracking latency. Guardog eliminates all three.\n\nStandard Python string manipulation forces the server to create complete copies of the payload in RAM. If a server ingests a 50MB log, standard engines spike memory usage by an additional 50MB to process it.\n**The Guardog Advantage:** Guardog utilizes Python's Read-Write Buffer Protocol (`w*`\n\n). It passes a direct pointer to the `bytearray`\n\ninto the C-engine. The C-engine reads and mutates the data *in-place*. **RAM overhead during scanning is strictly 0 bytes**, preventing Out-Of-Memory (OOM) crashes in containerized environments.\n\nPython’s GIL prevents true multi-core processing.\n**The Guardog Advantage:** Guardog drops into raw C, releases the GIL, and deploys OpenMP multi-threading. It slices the payload across available CPU cores. C-threads map secrets into private memory structs (Map phase) without ever halting to acquire a lock. The GIL is only reacquired once at the very end to dump the structs into a Python list (Reduce phase).\n\nStandard regex uses Non-Deterministic Finite Automata (NFA), which suffers from \"catastrophic backtracking.\"\n**The Guardog Advantage:** Guardog pre-compiles all rules into an Aho-Corasick-style DFA matrix. The engine processes exactly one byte per clock cycle, regardless of how many rules are loaded. Whether scanning for 3 secrets or 3,000 secrets, the execution latency remains completely flat.\n\nTo maintain O(1) latency and Zero-Copy memory efficiency, specific computer science trade-offs were made. Engineering teams must evaluate the following structural profile:\n\n**Cross-Boundary Secret Detection:** OpenMP thread chunks feature a`MAX_LOOKAHEAD`\n\noverlap window. If a 20-character API key is perfectly sliced in half by two different CPU cores, the engine still successfully detects and redact it.**Overlapping Secret Resolution (Two-Pass Mutation):** Guardog maps all coordinates in Pass 1, and mutates in Pass 2, guaranteeing 100% detection of adjacent or overlapping secrets (prevents the \"Amnesia Overwrite\" bug).**Non-Destructive Pipeline Integrity:** Guardog operates on the raw byte layer. It does not force destructive Unicode normalization. Legitimate JSON structures, foreign languages, and mathematical symbols are passed through flawlessly.**Cryptographic Tamper Resistance:** The compiled`.matrix`\n\nfile is locked with a SHA-256 signature to prevent silent fail-open states if the matrix is corrupted.\n\n**No PCRE Backreferences:** Because it is a pure DFA, it has no memory of previously matched groups.**No Unbounded Wildcards (** To prevent L3 Cache misses, the compiler enforces a hard limit of`.*`\n\n):`15,000`\n\nstates. Guardog is designed for structural tokens (Keys, SSNs, Credit Cards, JWTs), not free-form linguistic parsing.**No Runtime Hot-Swapping:** To update or add new rules, DevSecOps must compile a new matrix file, and the application must be restarted to load the new binary signature into RAM.\n\nBecause Guardog releases the GIL, it is perfectly safe to run inside asynchronous web frameworks like FastAPI without blocking the main event loop.\n\n``` python\nfrom fastapi import FastAPI, Request\nimport guardog\n\napp = FastAPI(title=\"Secure Ingestion API\")\n\n# Initialize the DFA matrix into memory once at startup\ndetector = guardog.GuardogSession(\n    matrix_path=\"guardog.matrix\", \n    meta_path=\"guardog_meta.json\"\n)\n\n@app.middleware(\"http\")\nasync def secure_payload_firewall(request: Request, call_next):\n    body_bytes = await request.body()\n    \n    if body_bytes:\n        payload_text = body_bytes.decode('utf-8', errors='ignore')\n        \n        # Scrub payload at 570+ MB/s\n        sanitized = detector.sanitize(payload_text)\n        \n        if sanitized[\"matches\"]:\n            print(f\"[SECURITY ALERT] Intercepted secrets: {sanitized['matches']}\")\n            \n    return await call_next(request)\n```\n\nThe asset includes a master orchestrator script that fully automates the cleaning, compiling, verification, and performance evaluation workflows in a single command.\n\nEnsure the host environment has Python 3.8+ (64-bit) and a C-compiler (MSVC with OpenMP for Windows, or GCC/Clang with libgomp for Linux/macOS).\n\nTo perform a clean end-to-end installation validation, execute:\n\nBash python run_pipeline_test.py This command will auto-compile the rule matrix, build the C-extension, run the structural boundaries test suite, and output the final MB/s benchmark to standard output.\n\nModifying the Security Rules To add or modify security tokens:\n\nOpen matrix_compiler.py.\n\nLocate the DEFAULT_RULES dictionary and add your structural strings:\n\n```\nPython\nDEFAULT_RULES = {\n    \"AWS_KEY\": [\"AKIAIOSFODNN7EXAMPLE\"],\n    \"INTERNAL_API\": [\"cyburn_prod_77x9a\"] # Custom rules here\n}\n```\n\nRun python matrix_compiler.py to regenerate the matrix signatures.\n\nThis software asset is offered by CyBurn Digital under an Exclusive IP Buyout / Asset Transfer Agreement.\n\nUpon execution of the sale, the purchasing entity will receive:\n\n100% Intellectual Property Ownership: Full rights to the C-source code (engine.c), compilation architecture, test matrices, and integration blueprints.\n\nUnrestricted Usage: The buyer is authorized to integrate, modify, and distribute this compiled binary within their proprietary infrastructure or client deliverables.\n\nRoyalty-Free: No ongoing licensing fees or volume-based throughput charges.\n\nAir-Gapped Security Guarantee: This engine contains zero external network calls, telemetry, or \"phone-home\" logic.\n\nTo acquire this asset or request an engineering code audit, please contact:\nVindana Sandun\nDirector, CyBurn Digital\n[mailtovindana@gmail.com](mailto:mailtovindana@gmail.com)\n+94 76 388 5727 (WhatsApp)", "url": "https://wpnews.pro/news/a-zero-copy-c-python-dfa-engine-that-scrubs-logs-at-571-mb-s", "canonical_source": "https://github.com/thedevilhimselfcodes/guardog", "published_at": "2026-06-14 08:17:22+00:00", "updated_at": "2026-06-14 08:31:00.855175+00:00", "lang": "en", "topics": ["developer-tools", "ai-infrastructure", "ai-safety"], "entities": ["Guardog", "Python", "C", "OpenMP", "Aho-Corasick", "SHA-256"], "alternates": {"html": "https://wpnews.pro/news/a-zero-copy-c-python-dfa-engine-that-scrubs-logs-at-571-mb-s", "markdown": "https://wpnews.pro/news/a-zero-copy-c-python-dfa-engine-that-scrubs-logs-at-571-mb-s.md", "text": "https://wpnews.pro/news/a-zero-copy-c-python-dfa-engine-that-scrubs-logs-at-571-mb-s.txt", "jsonld": "https://wpnews.pro/news/a-zero-copy-c-python-dfa-engine-that-scrubs-logs-at-571-mb-s.jsonld"}}