{"slug": "mike-acton-convex-primitive-collision-detection-reference-and-llm-optimized", "title": "Mike Acton: Convex Primitive Collision Detection – Reference and LLM-Optimized", "summary": "Mike Acton released a collision detection repository implementing a differentiable algorithm for convex primitives, achieving a 100× speedup over the reference implementation using an LLM-optimized solver. The project, generated by GPT-5.5 under human guidance, provides both a reference C implementation and an optimized single-precision version that runs 102× faster on a committed benchmark.", "body_md": "This repository implements the collision query from K. Tracy, T. A. Howell, and\nZ. Manchester, *\"Differentiable Collision Detection for a Set of Convex\nPrimitives\"* (arXiv:2207.00669, `documents/2207.00669.pdf`\n\n). For a pair of\nconvex primitives — sphere, box, capsule, or convex polytope — it computes the\nminimum uniform scaling **α** that must be applied to both shapes for them to\ntouch (the paper's problem (10)), and the contact points from eq. (24). `α < 1`\n\nmeans they overlap, `α > 1`\n\nmeans they are separated.\n\nThis is a **narrow-phase** solver. It assumes the caller has already run a cheap\nbroadphase and discarded pairs whose world AABBs do not overlap, so only\nAABB-overlapping pairs are ever queried. The committed benchmark reflects that\nassumption — its 1000 pairs are all AABB-overlapping (near-contact or\npenetrating), so the timing measures real narrow-phase work rather than the\ntrivial rejection of far-apart shapes.\n\nThere are two implementations here:\n\n— a reference C implementation that follows the paper directly.`src/`\n\n— an optimized single-precision implementation that produces the same collision flags and the same distances (within a stated tolerance) and runs the committed 1000-pair benchmark`src-optimized/`\n\n**about 102× faster** than the reference: reference median ≈ 0.276 s, optimized median ≈ 0.0027 s (median-of-5, single thread, on my machine — gcc 11, x86-64, WSL2).\n\nThat 102× crossed the **100× target** I set for the committed benchmark. It also\nholds up off that benchmark: on alternate-seed inputs it measures **97.6–101.7×**\n(four seeds), all passing correctness. I would not call it a *uniform* 100× —\ntwo of the four seeds land just under — so I claim \"100× on the committed\nbenchmark, ~98–102× generally,\" and no more. Numbers and caveats are in\n[Results and limits](#results-and-limits).\n\nTwo reasons, equally important:\n\n**To provide the optimized collision routines.**`src-optimized/`\n\nis real, tested code you can build and use, held to the reference by an independent harness.**To show how an LLM was used to do the optimization**— concretely and reproducibly. Every phase of this project was generated by a language model from an instruction document I wrote, and every result was checked by a harness that the model could not talk its way around. I want the method to be inspectable, not a story you have to take on faith.\n\nThe model under test here was **GPT-5.5**. This is one model, one run — a case\nstudy in *how* to drive an LLM at an optimization problem, not a benchmark\ncomparing models.\n\nI find it clearest to separate the four roles explicitly.\n\n| Role | What it did |\n|---|---|\nMe (the human) |\nDefined the problem and the output contract. Set the 100× target. Wrote the four instruction documents. Encoded my engineering approach as operating rules fed into every conversation. Course-corrected and decided what to keep. |\nGPT-5.5 (the model) |\nGenerated the reference implementation, the test harness, and the optimized solver from those documents; proposed and implemented each optimization; kept the optimization log. |\nThe test harness |\nThe ground truth. Compared optimized output against the reference, validated it with independent code, certified the contact points, checked determinism, and timed it. Nothing here is claimed without it. |\nnagent (the LLM harness) |\nThe agent loop that ran the optimization — structured, file-based, and grounded each turn by the proof harness. |\n\nWhere the **100× target** came from: I read the reference code and made a\njudgment call about what I thought was achievable on this hardware. It is not a\nderived bound or a proof of a ceiling — it is an engineer's estimate, and I\nstate it as one. It turned out to be roughly the right order of magnitude to\npush hard against.\n\nMy approach is itself written down, in `context/data-oriented-design.md`\n\n. Those\noperating rules — start from the real data, state the cost, remove work before\ndoing it faster, handle the common case straight-line — were injected into every\noptimization conversation. So \"what I contributed\" is not just the target and\nthe prompts; it is the method the model was made to follow.\n\nThe structured state and the per-turn proof are not ceremony. An LLM left to\noptimize on its own tends to drift: it reasons from its recollection of the last\nresult instead of a fresh measurement, it can lose a good change that was never\ncommitted, and it can report a result it did not actually run. Keeping the\nworking state in inspectable files, committing every kept gain immediately, and\ninjecting the real gate-and-speedup status every turn are what turn *\"the model\nsays it is faster and correct\"* into *\"measured faster, gates pass, committed.\"*\nThat is the difference between a demo and a result.\n\nEach phase is an instruction document and the artifact it produced. The\ndocuments live in `prompts/`\n\n.\n\n-\nA faithful C11 port of the paper: the α solve and the contact points from eq. (24), with explicit input validation. This is the correctness anchor everything else is measured against.`prompts/create-reference.md`\n\n→ the reference (`src/`\n\n). -\nThis specifies the test, comparison, and measurement scaffold and its constraints: a fixed, committed 1000-pair input; a reference-vs-optimized comparator; an independent validator that shares no code with either solver; a contact-point certifier; a determinism check; and a median-of-5 timing protocol. Crucially, the harness was built and proven against an`prompts/create-optimized-test-harness.md`\n\n→ the harness.*identity copy*of the reference**before** any optimization existed (see`performance-test-optimized/HARNESS-BASELINE.md`\n\n), so the measurement pipeline itself was trusted before it was used to judge anything. -\nThis is my optimization approach turned into instructions the model iterates on: profile where the cycles go, rank candidates by payoff, prefer removing work, run a simplification pass, keep the common case branch-minimal, and treat data layout and batching as first-class. The model ran this loop inside`prompts/create-optimized.md`\n\n→ the optimized solver (`src-optimized/`\n\n).**nagent**([https://github.com/macton/nagent](https://github.com/macton/nagent)) — a data-oriented agent loop where the working state lives in plain files and the model acts only through a fixed set of structured tags. The proof harness was wired to run**once per turn**:\n\n```\nnagent --read prompts/create-optimized.md \\\n       --hook-per-run ./prove-optimized-harness.sh \\\n       \"Continue until 100x target reached.\"\n```\n\nso every turn began with the real, measured gate status injected into the conversation — not the model's memory of it.\n\n-\nDescribed below.`prompts/create-visualizer.md`\n\n→ the visualizer (`viz/`\n\n).\n\nThe full per-hypothesis history, with measurements and keep/revert decisions, is\nin `src-optimized/OPTIMIZATION-LOG.md`\n\n. The git history mirrors it: one commit\nper kept change, plus a commit recording each rejected trial. The shape of the\nprogress matters more than any single step — it was incremental, measured, and\nreversible, and the dead ends were written down rather than hidden.\n\n**Kept (roughly in order):**\n\n- Replace the reference's log-barrier Newton solve with a support/GJK + bisection computation of α — the single largest win.\n- Per-type specializations: separating-axis (SAT) paths for box-box and an asymmetric SAT for box-polytope; shifted GJK paths for sphere/capsule-polytope.\n- Move per-shape work into a build-stage precompute that is\n**excluded from the timed solve**(the runtime solves from a flat precomputed table). - Single precision throughout, made safe by re-centering each pair to metre scale before solving.\n- Stop building global polytope half-spaces up front; compute the few axes a pair actually needs, and precompute the polytope's unique hull edges for the box-poly SAT.\n- Compact the active-path build state; specialize and force-inline the hot support function.\n- Closed-form (analytic) contact witnesses for the radius-shape families (sphere/capsule, box-capsule, sphere/capsule-polytope), avoiding GJK for the witness where the geometry allows it.\n- Reduce bisection/refinement iteration counts where the extra steps did not change the result within tolerance.\n\n**Rejected (recorded, not hidden):** a box-poly shifted-GJK path and a box-poly\nSAT path that either regressed or broke the tolerance/flag contract; several\ninlining, bracketing, and iteration-cap trials that did not measurably help; a\ncopy-removal in the solve wrapper; and assorted witness-bookkeeping changes. Each\nis a one-line commit and a log entry with the reason it was dropped.\n\nThe log also records the **cost** of each hypothesis — wall-clock and tokens —\nso the price of the whole exercise is visible, not just the result.\n\nThe optimized solver is not bit-for-bit identical to the reference, and it is not supposed to be. It is accepted only when:\n\n- the\n**collision flags are identical**— it flags exactly the same pairs as colliding as the reference; and **every distance** agrees within`|Δ| ≤ 1 mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)`\n\n(`build/compare_results`\n\n). The 1 mm floor is the documented resolution; the relative term covers large separations; the last is a conditioning term — a fixed α error scales by`|c1−c2|/α²`\n\n, so it grows only at extreme penetration, where single precision genuinely cannot resolve the depth and the value is least actionable.\n\nContact points are **certified for validity, not matched**: a face or edge\ncontact has many equally valid witness points, so `build/validate_contacts`\n\nindependently checks that each emitted point lies on both surfaces and is\nseparated by the reported distance, rather than requiring it to equal the\nreference's choice.\n\n`viz/`\n\nis a small, self-contained web tool (`prompts/create-visualizer.md`\n\n) that\nrenders one query pair at a time: the two primitives, the contact points emitted\nby both the reference and the optimized solver, and the separation between them.\nIt is how I eyeball that a result is geometrically sane, not just within a\ntolerance number. The images in this README were produced by it.\n\n```\n# after `make -f Makefile.optimized optimized` and a run that produced the\n# two results files:\ncd viz\npython3 -m http.server 8000      # ES modules need an origin\n# open http://localhost:8000/index.html\n```\n\nMeasured on my machine (gcc 11.4.0, x86-64, WSL2), committed 1000-pair input, median-of-5, single thread.\n\n**Speedup:****≈ 102×** on the committed input (reference median ≈ 0.276 s, optimized ≈ 0.0027 s) — over the 100× target.**Generalization:** four alternate-seed inputs measure**97.6×, 97.8×, 101.7×, 102.1×**, all passing correctness. So it generalizes well, but not uniformly to 100× — I claim 100× on the committed benchmark and ~98–102× generally, not a universal 100×.**Gates (every kept step):** full reference test suite 178/0; comparator 0 flag mismatches, 0 distances over tolerance; independent validator 0 failures; contacts 1000/1000 valid; output byte-identical run-to-run; committed input checksum unchanged.\n\nTwo honest caveats. First, these are wall-clock medians on one noisy machine;\ntreat them as the right order of magnitude, not three significant figures.\nSecond, some of the late gains came from reducing solver iteration counts, which\nspends down the accuracy margin (max distance deviation grew from ~1 mm toward\n~5 mm while staying inside the conditioned tolerance), and from two selective\nfast-math compiler flags (`-fno-signed-zeros`\n\n, `-fno-math-errno`\n\n; the more\naggressive ones were tried and rejected when they broke the gates). The most\ndurable headroom from here is structural — batching and data layout — rather\nthan more iteration-shaving.\n\n```\n# reference: build + test + time\nmake clean && make\nmake test                              # 178 passed, 0 failed\n./run-performance-test                 # reference timing on the committed input\n\n# optimized: build + the full end-to-end proof (build, gates, timing, seeds)\nmake -f Makefile.optimized optimized\n./prove-optimized-harness.sh           # prints a FINAL SUMMARY + PROOF verdict\n./prove-optimized-harness.sh --verbose # same, streaming every step\n```\n\nThe proof harness verifies the committed input's sha256 is unchanged\n(`9bd4939dc3d6c7d66459fe064768bf2d904b59410c4d8929107c9264c96dd555`\n\n), so the\nbenchmark cannot be quietly edited to flatter a result.\n\n-\nUnits: metres; positions and distances are float32. Every world-AABB corner within ±8,192 m; every primitive's world-AABB extent within [0.1, 250] m; results correct to 1 mm.\n\n-\nThe library allocates nothing — the caller passes a scratch buffer:\n\n```\n#include \"src/collide.h\"\nsize_t cp_collide_scratch_bytes(uint32_t prim_count);\nvoid   cp_collide_pairs(const cp_prim *prims, uint32_t prim_count,\n                        const cp_pair *pairs, uint32_t pair_count,\n                        cp_result *results, void *scratch, size_t scratch_bytes);\n```\n\nArrays in, arrays out; pairs reference primitives by index; a single query is\n\n`pair_count = 1`\n\n. Out-of-range coordinates/sizes, bad primitives, and non-convergence are reported per result via an explicit`status`\n\n— never clamped, never silently accepted.\n\nK. Tracy, T. A. Howell, Z. Manchester.\n\nDifferentiable Collision Detection for a Set of Convex Primitives.arXiv:2207.00669. (`documents/2207.00669.pdf`\n\n)", "url": "https://wpnews.pro/news/mike-acton-convex-primitive-collision-detection-reference-and-llm-optimized", "canonical_source": "https://github.com/macton/differentiable-collisions-optc", "published_at": "2026-06-16 09:25:33+00:00", "updated_at": "2026-06-16 09:48:58.787730+00:00", "lang": "en", "topics": ["ai-tools", "developer-tools"], "entities": ["Mike Acton", "GPT-5.5", "K. Tracy", "T. A. Howell", "Z. Manchester"], "alternates": {"html": "https://wpnews.pro/news/mike-acton-convex-primitive-collision-detection-reference-and-llm-optimized", "markdown": "https://wpnews.pro/news/mike-acton-convex-primitive-collision-detection-reference-and-llm-optimized.md", "text": "https://wpnews.pro/news/mike-acton-convex-primitive-collision-detection-reference-and-llm-optimized.txt", "jsonld": "https://wpnews.pro/news/mike-acton-convex-primitive-collision-detection-reference-and-llm-optimized.jsonld"}}