{"slug": "ai-orchestrated-3d-asset-pipeline-from-jpeg-to-game-ready-glb-without-touching", "title": "AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender", "summary": "A developer built an AI-orchestrated 3D asset pipeline that converts JPEG images into game-ready GLB files without manual Blender use. The system uses an AI agent operating Blender through the Model Context Protocol (MCP), with a vision model validating each step by analyzing viewport screenshots. After rigging six animated models for a Godot 4 project, the developer found that the key pattern is teaching the AI agent to handle failures through a vision feedback loop rather than writing perfect scripts.", "body_md": "TL;DR:I built a pipeline where an AI agent operates Blender through MCP (Model Context Protocol), while a vision model validates every step by looking at screenshots. I never opened Blender's GUI for modeling. Here's what worked, what broke, and the patterns that emerged after rigging 6+ animated models for a Godot 4 project.\n\nI needed animated 3D fish for a virtual aquarium in Godot 4. I don't know Blender. Instead of learning it, I built a pipeline where AI does the work and I supervise.\n\n**The stack:**\n\n**The architecture:**\n\n```\nHuman (instructions)\n  → AI Agent (generates bpy code)\n    → MCP Protocol (JSON-RPC over stdio)\n      → Blender Addon (socket :9876, executes Python)\n        → Viewport Screenshot\n          → Vision Model (validates result)\n            → AI Agent (adjusts or proceeds)\n              → Export GLB → Godot\n```\n\nThe human speaks problems. The AI translates them into Blender Python. The vision model confirms whether the result looks correct. Nobody clicks anything in Blender.\n\nTraditional 3D pipeline: learn Blender (weeks), model manually (hours per asset), rig by hand (more hours), debug in Godot (pain).\n\nAI-orchestrated pipeline: describe what you want, AI executes, vision model validates, iterate until correct. First model takes a couple of hours of prompt debugging. By the tenth model, you're done in 10 minutes.\n\nThe key insight: **you don't automate Blender by writing a perfect script once. You automate it by teaching an AI agent to handle failures through a vision feedback loop.**\n\nThis is the most important pattern. Everything else depends on it.\n\n```\n1. AI executes ONE Blender operation\n2. Take a screenshot of the viewport\n3. Vision model checks the result\n4. If OK → next step. If FAIL → undo → try different approach.\n```\n\n**Why not batch operations?** If the AI executes 6 bone extrusions in sequence and something breaks at step 2, neither the AI nor you can tell where it went wrong. One action per cycle means deterministic rollback.\n\n**Why vision validation?** Blender's Python API doesn't always tell you the truth about visual results. A bone might report correct coordinates but visually overlap with another bone. Weights might be \"assigned\" but produce garbage deformation. The viewport screenshot is ground truth.\n\n**Anti-stuck rule:** if the same approach fails 3 times in a row, the AI must switch strategy. Extrude not working? Try moving the bone directly. Auto-weights failing? Switch to manual Gaussian assignment.\n\nA naive prompt to a vision model produces naive answers. \"Look at this Blender screenshot\" gets you \"I see some orange lines.\" You need structured, domain-specific prompts.\n\n**Bad:**\n\n```\n\"Check the skeleton\"\n```\n\n**Good:**\n\n```\n\"You are a rigging tech lead. Count the bones in the armature. \nCheck: 1) All bone heads connect to previous bone tails? \n2) Last bone reaches the end of the mesh?\nAnswer strictly: bones=N|chain_ok=true/false|tail_reach=true/false\"\n```\n\n**Three prompt templates that cover 90% of validation:**\n\n| Mode | Prompt format | When to use |\n|---|---|---|\n| Skeleton check | `bones=N\\ | chain_ok=true/false\\ |\n| Rigging check | {% raw %}`weights_painted=true/false\\ | only_tip_deforms=true/false\\ |\n| State check | {% raw %}`mode=EDIT/POSE/OBJECT\\ | selected=Bone.006\\ |\n\n**Critical tips:**\n\n`bpy.ops.wm.redraw_timer(type='DRAW_WIN_SWAP', iterations=1)`\n\n. Without this, the screenshot captures a stale frame.Blender retains actions, armature data, and mesh data even after deleting objects from the scene. If you rig Fish A, then import Fish B without cleaning, Fish A's bone animations leak into Fish B's export.\n\n**Real incident:** Koi bone names appeared in Pterophyllum's GLB export, causing \"Animation target not found\" warnings in Godot.\n\n**Mandatory cleanup script before each new model:**\n\n``` python\nimport bpy\n\n# Delete all scene objects\nfor obj in list(bpy.context.scene.objects):\n    bpy.data.objects.remove(obj, do_unlink=True)\n\n# Purge all orphan data blocks\nbpy.ops.outliner.orphans_purge(\n    do_local_ids=True, \n    do_linked_ids=False, \n    do_recursive=True\n)\n\n# Verify: everything should be zero\nprint(f\"Objects: {len(bpy.data.objects)}, \"\n      f\"Actions: {len(bpy.data.actions)}, \"\n      f\"Armatures: {len(bpy.data.armatures)}, \"\n      f\"Meshes: {len(bpy.data.meshes)}\")\n```\n\n**Rule: one model at a time. Import → rig → weight → test → export → clean. Only then start the next one.**\n\nBlender's `ARMATURE_AUTO`\n\nweight assignment calculates distance from each bone to each vertex. This works for simple meshes. For thin geometry (fins, veils, tails), all bones appear \"close\" to all vertices, and the algorithm produces garbage.\n\n**Symptoms:**\n\n**What works instead: manual Gaussian weight assignment.**\n\n``` python\nimport math\n\nsigma = 0.03  # adjust per bone size\nfor v in mesh.data.vertices:\n    v_local = arm.matrix_world.inverted() @ mesh.matrix_world @ v.co\n    d = (v_local - bone_head).length\n    if d < sigma * 3:\n        w = math.exp(-d*d / (2*sigma*sigma))\n        if w > 0.05:\n            group.add([v.index], w, 'REPLACE')\n```\n\nFollow with normalization and smoothing (`vertex_group_smooth(factor=0.3, repeat=1)`\n\n). Then validate with the vision model.\n\n**Another common trap: neutral_bone or Root eating all weights.** If a bone sits at origin with\n\n`use_deform=True`\n\n, auto-weights assign it to everything. Fix: `bone.use_deform = False`\n\nfor utility bones, then re-bind.Many things that work in Blender break silently in Godot. These cost the most debugging time.\n\nBlender defaults to Quaternion for armatures after GLB import. If your AI writes `bone.rotation_euler.x = -0.5`\n\n, nothing happens. The bone ignores Euler when in Quaternion mode.\n\n**Fix:** always set `bone.rotation_mode = 'XYZ'`\n\nbefore animating with Euler, or work in Quaternion throughout.\n\nIf a bone's rest pose isn't aligned to world axes, Godot applies animation offsets relative to a non-identity transform. Result: the jaw nods the entire head instead of opening the mouth.\n\n**Fix:** in Edit Mode, align all bones strictly along X/Y/Z axes. Set `roll = 0`\n\nfor every bone. After posing, clear all transforms — the mesh should not move. If it moves, rest pose is wrong.\n\nGodot 4.x sometimes ignores bone scale if rest pose doesn't match skeleton rest. Gill breathing animated via `scale.x`\n\non a bone worked in Blender but did nothing in Godot.\n\n**Fix:** use Shape Keys (blend shapes) instead of bone scale for facial/gill animation. Shape Keys work deterministically in both Blender and Godot. Bone animation is only for rotation-based movement (swimming, tail wagging).\n\nGodot doesn't understand Blender constraints (Copy Rotation, etc). They must be baked before export.\n\n```\nbpy.ops.nla.bake(\n    frame_start=1, frame_end=60,\n    visual_keying=True,      # bake constraint results\n    clear_constraints=True,  # remove constraints from export\n    bake_types={'POSE'}\n)\n```\n\nBody axis in Blender is X, in Godot is -Z. All models need a 90° rotation on import. Apply transforms before export: `bpy.ops.object.transform_apply(location=True, rotation=True, scale=True)`\n\n.\n\nBlender animation at 30 FPS plays at half speed in Godot's 60 FPS physics. Set `AnimationPlayer.speed_scale = 2.0`\n\nor bake at 60 FPS from the start.\n\nThe coding AI cannot handle multi-step instructions reliably. \"Animate Tail1, Tail2, Tail3 and both pectoral fins\" produces `bpy.ops.pose.select_all`\n\nand breaks everything.\n\n**Fix:** one bone per call. Animate Tail1 → vision check → animate Tail2 → vision check → ... → bake all together at the end.\n\nBlender's API is context-sensitive. Most `bpy.ops`\n\ncalls fail with \"poll() failed, context is incorrect\" if you're in the wrong mode.\n\n**Rules the AI must follow:**\n\n`mode_set(mode='POSE')`\n\n→ set `active = armature`\n\n`mode_set(mode='WEIGHT_PAINT')`\n\n→ set `active = mesh`\n\n`mode_set(mode='EDIT')`\n\nfor armature → first go to OBJECT, then set active, then EDIT`select_all(action='DESELECT')`\n\nonly works in OBJECT modeAfter 3 failed attempts with the same approach, force a strategy change. This must be an explicit rule in the agent's instructions, not a hope.\n\nAfter each model, document what broke and how you fixed it. This creates a growing knowledge base that makes each subsequent model faster.\n\n**Format:**\n\n```\nSymptom: [what you observed]\nCause: [root cause]\nFix: [code or procedure]\nApplies to: [which model types]\n```\n\n**Examples from real production:**\n\n| # | Symptom | Cause | Fix |\n|---|---|---|---|\n| 1 |\n`rotation_euler` has no effect |\n`rotation_mode='QUATERNION'` |\nSet `rotation_mode='XYZ'` first |\n| 2 | Entire body moves when rotating fin |\n`use_connect=True` on fin bone |\nSet `use_connect=False` , parent to Spine1 |\n| 3 | Orphan animations in exported GLB | Previous model's data not purged | Full cleanup script between models |\n| 4 | Jaw nods the head in Godot | Rest pose not identity | Align bones to world axes, `roll=0`\n|\n| 5 | Gills don't animate in Godot | Scale on bones ignored by Godot 4 | Use Shape Keys instead of bone scale |\n| 6 | Vision model says FAIL but code says PASS | Wrong viewport angle | Set camera to RIGHT/FRONT view before screenshot |\n\n**After ~10 models, PSP becomes your real pipeline.** The AI reads it before starting each new model and avoids known pitfalls. First model: 3 hours. Tenth model: 20 minutes.\n\nThe most powerful pattern that emerged: using the vision model as a test framework.\n\n``` python\ndef assert_vision(question, expected_answer):\n    result = vlm_ask(screenshot(), question)\n    if expected_answer.lower() not in result.lower():\n        raise AssertionError(\n            f\"Vision assert failed: expected '{expected_answer}', got '{result}'\"\n        )\n```\n\n**Usage:**\n\n```\n# After rigging\nassert_vision(\"Tail3 rotated 45°. What bent? A) Only tip B) Whole tail C) Entire body\", \"A\")\n\n# After weight painting  \nassert_vision(\"Head changed position?\", \"NO\")\n\n# After animation bake\nassert_vision(\"Frame 1 and frame 60. Same pose?\", \"YES\")\n\n# After export and Godot import\nassert_vision(\"Skeleton visible? Tail bends?\", \"YES\")\n```\n\nThis is CI/CD for 3D. If you change weights tomorrow, run the assert suite. If anything breaks, you know immediately.\n\n```\n1.  Clean Blender scene (purge orphans)\n2.  Import GLB from Meshy.ai\n3.  Orient body along X axis (rotate Z -90°, apply transforms)\n4.  Decimate to target polycount (ratio 0.15-0.3)\n5.  Create armature: spine chain + fins + jaw\n6.  Parent mesh to armature with empty vertex groups\n7.  Assign weights: Gaussian for each bone, normalize, smooth\n8.  Vision check: rotate each bone → \"only target deforms?\"\n9.  Selective zero: remove weight leaks from body to face bones\n10. Vision check: jaw/gills move independently?\n11. Create swim animation: sin wave on spine chain, 60 frames\n12. Vision check: frame 1 = frame 60? Natural motion?\n13. Bake action: visual_keying=True, clear_constraints=True\n14. Export GLB with animations and Shape Keys\n15. Import in Godot, verify animation plays correctly\n16. Clean Blender scene for next model\n```\n\nBetween steps 7-10, expect 2-5 iterations per bone. This is normal. The feedback loop (AI executes → vision validates → AI adjusts) converges quickly once PSP covers common failure modes.\n\n| Metric | First model | After PSP (latest models) |\n|---|---|---|\n| Time to rigged GLB | ~2 hours | ~10 minutes |\n| Manual Blender work | Occasional weight painting | Zero |\n| Vision checks per model | 15-20 | 3-5 |\n| Export failures | 3-4 attempts | Usually first try |\n\nThe bottleneck shifted from \"learning Blender\" to \"debugging AI prompts.\" When the AI makes a mistake, 90% of the time it's because the vision model gave bad feedback. Fix one line in the VLM prompt — the entire system gets smarter.\n\nAn important optimization emerged during the project. The initial architecture used a small local vision model (Qwen3VL-4B) purely for validation, while a separate coding AI generated the Blender Python. This meant two models, two contexts, two sets of prompts, and a manual bridge between them.\n\nLater, I switched to a larger Qwen model accessed through MCP that could both see the viewport and write code. One model that understands what it's looking at AND knows how to fix it. The feedback loop collapsed from \"AI writes code → screenshot → VLM checks → human relays feedback → AI adjusts\" to \"AI writes code → looks at result → adjusts itself.\"\n\nThis cut iteration time significantly. The patterns in this article still apply — one action per check, structured prompts, PSP — but the architecture becomes simpler when vision and coding live in the same model.\n\n**One action, one check.** Never let the AI chain operations blindly. Deterministic rollback requires deterministic steps.\n\n**Vision validation is non-negotiable.** Code can report success while the viewport shows garbage. The screenshot is ground truth.\n\n**Auto-weights fail on thin geometry.** Plan for manual Gaussian assignment on fins, veils, and facial features.\n\n**Blender and Godot speak different languages.** Rest pose identity, quaternion rotation, Shape Keys over bone scale, baked constraints — learn these once, document in PSP, never debug again.\n\n**PSP is the real product.** The pipeline isn't the code. It's the accumulated knowledge of what breaks and how to fix it. Each model teaches the system.\n\n**The human role is supervisor, not operator.** You describe problems in natural language. The AI translates to code. The VLM validates visually. You make decisions when the system gets stuck.\n\nThe same architecture — AI agent + MCP tool + vision validation — applies beyond Blender. Any GUI-heavy professional tool that exposes an API can be orchestrated this way. The patterns (one action/one check, structured VLM prompts, PSP accumulation) are universal.\n\nThe agents aren't replacing 3D artists. They're making 3D accessible to people who have ideas but not the specialized skills to execute them. The quality ceiling is still set by human judgment — but the floor has risen dramatically.\n\n**Tested on:** Linux Mint 22.3, Blender 4.0+, Godot 4.x, NVIDIA RTX 5060 Ti (eGPU via Thunderbolt 4)\n\n**MCP Server:** BlenderMCP 1.27.1\n\n**Vision Models:** Qwen3VL-4B (local, llama.cpp) → later Qwen (larger, unified vision+coding via MCP)\n\n**Author:** Aleksandr Kossarev, Jõgeva, Estonia\n\n**Project:** [Arche Iscrin](https://archiscrin.bandcamp.com)\n\n*This article is based on 2300+ lines of production notes from rigging 6 animated fish models for a Godot virtual aquarium, using an AI-orchestrated pipeline without manual Blender operation.*", "url": "https://wpnews.pro/news/ai-orchestrated-3d-asset-pipeline-from-jpeg-to-game-ready-glb-without-touching", "canonical_source": "https://dev.to/aleksandr_kossarev_e23623/ai-orchestrated-3d-asset-pipeline-from-jpeg-to-game-ready-glb-without-touching-blender-1akf", "published_at": "2026-05-27 16:10:58+00:00", "updated_at": "2026-05-27 16:41:43.926956+00:00", "lang": "en", "topics": ["ai-agents", "computer-vision", "generative-ai", "ai-tools", "artificial-intelligence"], "entities": ["Blender", "MCP", "Godot", "JSON-RPC", "Python", "AI Agent", "Vision Model", "GLB"], "alternates": {"html": "https://wpnews.pro/news/ai-orchestrated-3d-asset-pipeline-from-jpeg-to-game-ready-glb-without-touching", "markdown": "https://wpnews.pro/news/ai-orchestrated-3d-asset-pipeline-from-jpeg-to-game-ready-glb-without-touching.md", "text": "https://wpnews.pro/news/ai-orchestrated-3d-asset-pipeline-from-jpeg-to-game-ready-glb-without-touching.txt", "jsonld": "https://wpnews.pro/news/ai-orchestrated-3d-asset-pipeline-from-jpeg-to-game-ready-glb-without-touching.jsonld"}}