cd /news/artificial-intelligence/brick-composer-using-mllms-for-assem… · home topics artificial-intelligence article
[ARTICLE · art-23132] src=arxiv.org pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Researchers have developed Brick-Composer, a learning framework that enables multimodal large language models (MLLMs) to assemble objects from diverse building blocks by improving brick selection accuracy by over three times and raising strict step-level assembly success from less than 1% to around 15%. The framework, introduced alongside the BC-Bench benchmark, trains MLLMs using human demonstrations, physical feedback, and synthetic experience to overcome current models' struggles with fine-grained brick selection and precise pose estimation. After training, a Qwen-3-8B model correctly composed up to 42% of the steps for a complete object, demonstrating that MLLMs can acquire assembly capabilities through targeted, physically grounded learning.

read1 min publishedJun 6, 2026

arXiv:2606.05445v1 Announce Type: new Abstract: We dream of AI agents that can read arbitrary designs and construct real-world objects from reusable building blocks. As a first step toward this vision, we study whether multimodal large language models (MLLMs) possess the visual grounding and spatial reasoning capabilities required for brick assembly. We formulate brick assembly as a sequential decision-making problem, where each step involves two subtasks: brick selection, identifying the target brick from candidate components, and brick pose estimation, predicting where and how the selected brick should be placed. To support this study, we introduce BC-Bench (Brick Construction Benchmark), the first benchmark for evaluating MLLMs on assembly with diverse bricks. Experiments show that current state-of-the-art MLLMs remain far from reliable builders, struggling with fine-grained brick selection and failing at precise pose estimation. To bridge this gap, we propose Brick-Composer, a learning framework that equips MLLMs with assembly skills through three complementary signals: Human Design Sparks, which provide affordance-rich construction demonstrations; World Feedback, which grounds predicted actions in visual and physical consequences; and Synthetic Experience, which scales learning beyond existing object designs. Brick-Composer improves brick selection accuracy by over three times, substantially reduces pose estimation errors, and raises strict step-level assembly success from less than 1% to around 15%. After training, a Qwen-3-8B can correctly compose up to 42% of the steps for a complete object, suggesting that MLLMs can acquire assembly capabilities through targeted, physically grounded learning.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/brick-composer-using…] indexed:0 read:1min 2026-06-06 ·