{"slug": "navigating-user-behavior-toward-personalized-multimodal-generation", "title": "Navigating User Behavior toward Personalized Multimodal Generation", "summary": "Researchers propose NaviGen, a system that encodes user behavior history into executable instructions for personalized image and video generation, overcoming misalignment between AIGC pipelines and user intent. Using a dual identifier combining collaborative and textual codes, NaviGen employs a two-stage SFT+RL pipeline to improve instruction writing and generation quality across product, game, and short-video domains.", "body_md": "arXiv:2606.24196v1 Announce Type: new\nAbstract: Modern AIGC pipelines deliver high-fidelity images and videos but presuppose a well-formed creation instruction, while end users rarely articulate visual details, leaving generators misaligned with user demand. We study personalized content generation, which turns a user's interaction history into an executable instruction for downstream synthesis, and identify two obstacles: behavior must be encoded in a form legible to language reasoning, and the model must acquire instruction-writing skill absent from both pretraining and behavior data. We propose NaviGen, which represents each item with a dual identifier coupling a collaborative code and a textual code as a behavioral substrate and a semantic bridge in one token stream. On this representation, a two-stage SFT+RL pipeline first distills preference reasoning and instruction writing from evolutionarily searched supervision, then aligns generation with user intent through hierarchical and self-consistent rewards. Experiments across product, game, and short-video domains show that NaviGen improves personalized image and video generation, strengthens next-item prediction, and yields more specific, relevant, and visually generatable instructions. Our code is anonymously released at: https://github.com/iLearn-Lab/NaviGen.", "url": "https://wpnews.pro/news/navigating-user-behavior-toward-personalized-multimodal-generation", "canonical_source": "https://arxiv.org/abs/2606.24196", "published_at": "2026-06-24 04:00:00+00:00", "updated_at": "2026-06-24 04:31:17.280264+00:00", "lang": "en", "topics": ["generative-ai", "large-language-models", "ai-products", "machine-learning"], "entities": ["NaviGen", "arXiv", "iLearn-Lab"], "alternates": {"html": "https://wpnews.pro/news/navigating-user-behavior-toward-personalized-multimodal-generation", "markdown": "https://wpnews.pro/news/navigating-user-behavior-toward-personalized-multimodal-generation.md", "text": "https://wpnews.pro/news/navigating-user-behavior-toward-personalized-multimodal-generation.txt", "jsonld": "https://wpnews.pro/news/navigating-user-behavior-toward-personalized-multimodal-generation.jsonld"}}