Show HN: Autonomous CAD design and OpenFOAM optimization loop using local LLMs

A developer created an autonomous multi-agent system that uses local LLMs, CadQuery, and OpenFOAM to design and optimize 3D-printable quadcopter propellers. The system employs a team of AI agents to propose designs and a physics-based referee to score them, enabling multi-objective optimization for quiet, efficient, and strong propellers. The project demonstrates a trustworthy approach where AI suggests but never scores, relying on trusted code for evaluation.

An autonomous multi-agent system for parametric design, simulation, and multi-objective optimization of 3D-printable quadcopter propellers — powered by free, local LLMs Ollama , CadQuery, and OpenFOAM CFD. This project tries to answer a simple question: instead of an engineer hand-tweaking a propeller and testing it over and over, can we let a group of AI models do that loop themselves — and end up with a better propeller than a person would patiently grind out by hand? The target propeller should be three things at once: quiet , efficient it doesn't waste battery , and strong it pushes a lot of air . Those goals fight each other — a bigger, grippier blade gives you more thrust but more noise, for example — so there's no single "best" answer. There's a set of good trade-offs, and the job is to find them. If you're new to AI agents, this is a nice thing to learn from, because it's not a chatbot. It's AI used as a worker that actually does something and checks its own results. Picture a small research team where every member is a program: - A few junior members are fast and cheap. Their job is to brainstorm — throw out lots of propeller designs, some sensible, some weird. They don't need to be smart, they need to be prolific. These run on your own computer with free local models via Ollama , so brainstorming costs nothing. - A lead researcher is the expensive, smart one. It doesn't generate the grunt work; it reads the results, notices patterns "the 5-blade designs are getting quieter — push harder there" , and decides what the team tries next. That role is Antigravity , the agent running the show. - A referee that never lies: plain, trusted math and physics code. The AIs only ever propose designs. They're never allowed to score their own work. The referee does the scoring, so a confident-but-wrong model can't fool the system. That last point is the whole trick, and it's worth remembering as a general lesson about AI: let models suggest, let trusted code decide. Models are great at coming up with options and terrible at being a reliable judge of truth. So we use them only for the part they're good at. The team works in a loop. One lap looks like this: Propose — the cheap models spit out a batch of new propeller designs. Build — code turns each design into actual 3D geometry a real CAD file . Score — the physics code estimates how much thrust, how much noise, and how efficient each one is. This first pass is fast approximate math, not a full simulation. Shortlist — the system keeps the designs that aren't beaten on every goal at once. That surviving set is called the Pareto front — the current "menu" of best trade-offs. Reflect — Antigravity looks at the front and steers: what's working, what to explore next. Write it down — a one-line summary of the round gets logged, and the loop starts again. Every so often, the most promising designs get the expensive treatment: a real fluid-dynamics simulation CFD , run locally with OpenFOAM that models the air actually flowing over the blade. That's slow, so we only spend it on candidates that already look good on the cheap math. There's one more helper worth naming: a surrogate model a Gaussian Process, from scikit-learn . Think of it as the team learning to guess the simulation's answer from the designs it has already simulated — so it can skip a lot of slow runs and spend them only where it's genuinely unsure. It's the system getting smarter about where to look as it goes. The one rule that makes this trustworthy: the AI agents only ever propose designs — they never score them. Scoring is done by plain physics code that can't be talked into a wrong answer. Here's the whole loop: flowchart TD OBJ "Antigravity sets the objective<br/ quiet · efficient · high thrust" subgraph propose "PROPOSE — agents only suggest never score " P "Proposer<br/ local LLM" M "Mutator<br/ local LLM" C "Coder<br/ writes a search operator" GA "Deterministic GA<br/ always runs, guarantees progress" end CAND "candidate designs<br/ 7 parameters each " subgraph score "SCORE — trusted physics, no LLM the ground truth " PERF "performance.py<br/ BEMT hover → thrust, Figure of Merit" TUB "tubercle analysis.py<br/ → noise reduction dB " STR "propeller physics.py<br/ → stress, resonance" V "evaluate.py<br/ objectives + constraints → feasible?" PERF -- V TUB -- V STR -- V end SEL "pareto.py<br/ keep the non-dominated designs" DB "SQLite research.db<br/ every design + score" REF "Antigravity reflects<br/ steers the next generation" CAD "generate propeller.py<br/ STEP / STL + watertight check" CFD "cfd verify.py<br/ OpenFOAM truth check" OBJ -- P & M & C & GA P & M & C & GA -- CAND CAND -- PERF & TUB & STR V -- SEL SEL -- DB -- REF REF -- |next generation| P SEL -- |best designs| CAD -- CFD Read it left-to-right, top-to-bottom: the agents and a deterministic genetic algorithm that always runs as a safety net throw out candidate designs → the physics scripts score each one → the non-dominated winners are kept and saved → Antigravity looks at the winners and steers the next round → the loop repeats. The best designs eventually drop out the bottom into CAD and CFD verification. This is the part most transferable to your own projects. Every prompt is readable plain English, and you can copy the pattern for any domain where you want a small local model to generate structured, validated output. The secret to making a 7B model reliably produce working CAD code and valid design parameters is role separation + sandboxed execution + self-correction . Each worker gets a single, constrained job with a strict output schema. | Worker | Model | What it does | Output | |---|---|---|---| Proposer | qwen2.5-coder:7b | Brainstorms brand-new designs from scratch | JSON array of 7-parameter design vectors | Mutator | qwen2.5-coder:7b | Takes a good design and creates small variations | JSON array of tweaked vectors | Coder | qwen2.5-coder:7b | Writes a Python search operator mutation function | JSON {"code": "..."} | CFD Analyst | phi4-mini | Reads OpenFOAM logs and diagnoses solver failures | JSON {"status": "...", "fix": "..."} | Scribe | phi4-mini | Writes one-line journal entries | Plain text | This is src/autoresearch/skills/proposer.md — the full prompt that a 7B model receives. Notice: no vague instructions, just hard bounds and domain knowledge: You are a PROPELLER DESIGN PROPOSER in an automated research swarm. Your job: propose NEW candidate propeller designs that might improve hover efficiency, increase tubercle noise reduction, or reduce blade mass. You output ONLY JSON. No prose, no markdown. The schema is: {"designs": {"chord root m": <float , "chord tip m": <float , "twist root deg": <float , "twist tip deg": <float , "tubercle amp m": <float , "tubercle wl m": <float , "n blades": <int }, ... } Hard bounds stay inside these; values outside are clamped : chord root m : 0.020 .. 0.034 chord tip m : 0.006 .. 0.014 twist root deg : 25 .. 45 twist tip deg : 6 .. 20 tubercle amp m : 0.0 .. 0.005 tubercle wl m : 0.020 .. 0.060 n blades : 2 .. 6 integer The Coder writes arbitrary Python, which is dangerous. The sandbox /ostenjap/LLM-Agent-generated-Quadcopter-Prop/blob/main/src/autoresearch/sandbox.py handles it in two layers: AST allowlist — before execution, an AST walker rejects any import outside {math, numpy, random} , any dunder access, and any dangerous builtin open , exec , eval , os , subprocess , etc. Subprocess isolation — the screened code runs in a fresh Python process with a hard timeout and a scratch working directory. Only a JSON line on stdout is accepted back. If the code is invalid, times out, or returns garbage, it's silently discarded — worst case is a wasted generation slot, never a corrupted archive: python The sandbox contract from sandbox.py : def mutate parents, bounds, rng : parents : list of design vectors bounds : list of lo, hi for each variable rng : random.Random instance for reproducibility returns : list of NEW design vectors ... STRICT sandbox rules violations → operator discarded : - Import ONLY: math, numpy, random. Nothing else. - No file/network/system access, no open/exec/eval. - Must return within 5 seconds. - Every value clamped into lo, hi bounds. When something fails — a bad mesh, a diverging CFD solver, malformed JSON — the error is fed back to the responsible worker with the diagnostic context. The CFD Analyst, for example, gets the tail of the solver log and the residual values, and must return exactly one concrete fix to try next: {"status": "diverging", "diagnosis": "U residuals climbing after iteration 200, likely Courant violation", "fix": "reduce deltaT from 1e-3 to 5e-4", "fields": {"deltaT": "5e-4"}} This is the pattern: structured output → validation → auto-retry . It works because the model never has to be right on the first try — it just has to be right eventually , within a budget of retries. 📂 All six worker prompts are in — read them directly, they're short and self-contained. src/autoresearch/skills/ If you work with OpenFOAM, CadQuery, or parametric design tools, this project is also a working reference for automating the design-simulate-optimize loop. Here's what's under the hood that you can reuse or learn from: Every propeller is defined by 7 parameters — chord at root and tip, twist distribution, tubercle amplitude and wavelength, and blade count. The generate propeller.py /ostenjap/LLM-Agent-generated-Quadcopter-Prop/blob/main/src/generate propeller.py script takes these 7 numbers and produces a watertight STEP/STL via CadQuery, with: Airfoil cross-sections lofted along the span with linear twist Leading-edge tubercles sinusoidal bumps inspired by humpback whale fins for noise reduction Automatic watertightness checking before any design enters the CFD pipeline STEP + STL export ready for meshing, printing, or further CAD work This is fully programmatic — no GUI, no manual steps. If you want to adapt it for a different part turbine blade, heat exchanger fin, any swept surface , the parametric structure is designed to be swapped in. The physics scoring stack is hand-written, not an LLM: BEMT hover analysis — Blade Element Momentum Theory for thrust and Figure of Merit at a fixed RPM and diameter src/optimization/ Tubercle noise model — analytical estimate of noise reduction from leading-edge serrations src/tubercle analysis.py Structural checks — centrifugal stress, resonance frequency clearance, tip Mach constraint src/propeller physics.py Non-dominated sorting — true Pareto front over three objectives efficiency, noise, thrust , not a weighted sum The surrogate Gaussian Process, scikit-learn learns from evaluated designs and proposes infill points via Expected Improvement, reducing how many full evaluations you need. The setup openfoam case.py /ostenjap/LLM-Agent-generated-Quadcopter-Prop/blob/main/src/setup openfoam case.py script generates a complete OpenFOAM case directory from a STEP file: snappyHexMesh dictionary with castellated/snap/layer settings tuned for propeller geometry simpleFoam with k-ω SST turbulence and appropriate boundary conditions Force coefficient extraction from postProcessing/forces/ Automated convergence checking — the CFD Analyst agent reads residuals and applies one fix at a time relaxation factors, time step, mesh quality when the solver diverges The CFD step is optional — the analytical loop runs standalone and fast. But when you want ground-truth validation, the pipeline is ready. The loop can run for hours, so it can't keep everything in its head. It writes everything to a single local database file, data/research.db SQLite . This isn't just bookkeeping. Because that database saves each result the instant it's final, the loop can be killed — power cut, crash, you closing the laptop — and pick up exactly where it left off instead of starting over. A human-readable diary of the run also lands in data/journal.md if you just want to skim what happened. Honest version: don't start this and immediately walk away the first time. The first run has setup to get through, and you'll want to see it work once. You need a few things installed first: Ollama with two local models: ollama pull qwen2.5-coder:7b and ollama pull phi4-mini OpenFOAM through WSL or Docker — only needed once you reach the simulation step- Python packages: pip install cadquery numpy scikit-learn matplotlib Then, the way you actually use it: open this folder in Antigravity and tell it "go to work." It reads AGENTS.md /ostenjap/LLM-Agent-generated-Quadcopter-Prop/blob/main/AGENTS.md — its instruction sheet — and starts working through the plan on its own, stopping to check in with you at the points that matter. To run the core loop by hand: cd src python -m autoresearch.researcher --no-llm --budget 30 quick, no AI — sanity check python -m autoresearch.researcher --budget 1800 the full team Once you've watched it work once and you trust it, the long runs are the part you can sleep through. There's a babysitter script for exactly that: powershell -ExecutionPolicy Bypass -File .\run overnight.ps1 It restarts the loop if it crashes, refuses to run forever there are time and restart caps , logs everything, and leaves a one-line verdict in data/RUN STATUS.txt for you to read with your coffee. Drop a file named STOP in this folder to stop it cleanly. If you wire in a Telegram token, it'll message you when it's done. First full run of the pipeline. The optimizer explored a few hundred feasible designs and mapped the trade-off surface between the three goals: The red rings are the Pareto front — designs that aren't beaten on all three goals at once, i.e. the current menu of best trade-offs. The bottom-right panel shows the search improving generation over generation. The best efficiency pick Figure of Merit 0.867, 38 N thrust, 6 blades was exported to CAD and passed the watertightness check: Actual 3D CAD model rendering of the generated 6-blade propeller: Honest caveat: these scores come from the fast analytical physics, and this winner sits against the edges of the allowed design range — so treat V1 as a working pipeline and a first map, not a final answer. CFD verification and a re-run with reviewed bounds come next. Regenerate these anytime: cd src python plot results.py docs/v1 results.png python export best.py cad/best fm. + validity report | Path | What's there | |---|---| implementation plan.md | AGENTS.md src/autoresearch/skills/ src/optimization/ src/generate propeller.py src/setup openfoam case.py src/propeller physics.py src/tubercle analysis.py data/research.db cad/ If you're an LLM developer: start with src/autoresearch/skills/ — those are the plain-English prompts the AI workers run on. If you're a CAD/CFD engineer: start with src/generate propeller.py and src/setup openfoam case.py — those are the parametric geometry and simulation pipelines you can adapt for your own parts. MIT /ostenjap/LLM-Agent-generated-Quadcopter-Prop/blob/main/LICENSE — use it, fork it, build on it.