Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

wpnews.pro

cd /news/artificial-intelligence/brick-composer-using-mllms-for-assem… · home › topics › artificial-intelligence › article

[ARTICLE · art-23132] src=arxiv.org ↗ pub=2026-06-06T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Researchers have developed Brick-Composer, a learning framework that enables multimodal large language models (MLLMs) to assemble objects from diverse building blocks by improving brick selection accuracy by over three times and raising strict step-level assembly success from less than 1% to around 15%. The framework, introduced alongside the BC-Bench benchmark, trains MLLMs using human demonstrations, physical feedback, and synthetic experience to overcome current models' struggles with fine-grained brick selection and precise pose estimation. After training, a Qwen-3-8B model correctly composed up to 42% of the steps for a complete object, demonstrating that MLLMs can acquire assembly capabilities through targeted, physically grounded learning.

read1 min views10 publishedJun 6, 2026

arXiv:2606.05445v1 Announce Type: new Abstract: We dream of AI agents that can read arbitrary designs and construct real-world objects from reusable building blocks. As a first step toward this vision, we study whether multimodal large language models (MLLMs) possess the visual grounding and spatial reasoning capabilities required for brick assembly. We formulate brick assembly as a sequential decision-making problem, where each step involves two subtasks: brick selection, identifying the target brick from candidate components, and brick pose estimation, predicting where and how the selected brick should be placed. To support this study, we introduce BC-Bench (Brick Construction Benchmark), the first benchmark for evaluating MLLMs on assembly with diverse bricks. Experiments show that current state-of-the-art MLLMs remain far from reliable builders, struggling with fine-grained brick selection and failing at precise pose estimation. To bridge this gap, we propose Brick-Composer, a learning framework that equips MLLMs with assembly skills through three complementary signals: Human Design Sparks, which provide affordance-rich construction demonstrations; World Feedback, which grounds predicted actions in visual and physical consequences; and Synthetic Experience, which scales learning beyond existing object designs. Brick-Composer improves brick selection accuracy by over three times, substantially reduces pose estimation errors, and raises strict step-level assembly success from less than 1% to around 15%. After training, a Qwen-3-8B can correctly compose up to 42% of the steps for a complete object, suggesting that MLLMs can acquire assembly capabilities through targeted, physically grounded learning.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/brick-composer-using-mll…

Read original on arxiv.org → arxiv.org/abs/2606.05445

mentioned entities

Brick-Composer

BC-Bench

MLLMs

Human Design Sparks

World Feedback

Synthetic Experience

metadata

slugbrick-composer-using-mllms-for-assembly-with-diverse-bricks

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevAI slop has infiltrated the home…

next →Automatically Attaching YouTube …

── more in #artificial-intelligence 4 stories · sorted by recency

machinebrief.com · 27 Jul · #artificial-intelligence

Zero-Shot Mission-Level Evaluation for Aerial MLLM Agents

oist.jp · 27 Jul · #artificial-intelligence

Study: Curious robots mimic how children can learn to understand language

insideai.news · 27 Jul · #artificial-intelligence

Apple Targets June 2027 Launch for AI Glasses to Rival Meta and Google

arxiv.org · 27 Jul · #artificial-intelligence

Toward High-Fidelity 3D Point-Cloud Learning for Brain Folding Morphology Prediction Using Trans-Unet

── more on @brick-composer 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 28 May · #ai-tools

Grok Build introduces /remember command for persistent context across coding sessions

wpnews · 26 Jul · #artificial-intelligence

Claude 5 Context Engineering: Anthropic Deleted 80% Prompt

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required