{"slug": "self-supervised-user-profile-generation-for-personalization", "title": "Self-supervised User Profile Generation for Personalization", "summary": "Researchers have developed BUMP, a self-supervised framework that trains large language models to generate personalized user profiles without requiring labeled data from downstream tasks. The system uses a bidirectional ranking objective to learn from raw user interaction histories, matching or exceeding the performance of methods that depend on expensive annotated supervision. This approach could enable more scalable personalization across recommendation, search, and dialogue systems by eliminating the need for task-specific labels.", "body_md": "arXiv:2606.05336v1 Announce Type: new\nAbstract: Personalizing large language models (LLMs) has become a central challenge as LLMs are deployed across recommendation, search, dialogue, and content generation -- settings where the same query should yield different answers given different users. A promising route is to summarize each user's interaction history into a natural-language memory or profile and prepend it to the prompt to facilitate personalization. Existing methods learn such profile generators with explicit rewards derived from labeled downstream tasks, which are expensive and sparse as they require annotated supervision for every target task. In light of this challenge, we introduce Bidirectional User Modeling via Profiles (BUMP), a self-supervised framework that trains a profile generator without any downstream labels. Specifically, given a user's interaction history, we use GRPO to train an LLM to emit a free-form textual profile under a bidirectional in-batch ranking objective: a small LLM judge measures (i) how well the generated profile, used as a query, ranks the user's own held-out interactions above interactions from other users in the batch, and (ii) how well a held-out interaction, used as a query, ranks the user's own profile above profiles of other users. Both directions are scored with multi-positive NDCG and combined into a dense reward per rollout; other users in the batch supply free negatives, so every training example yields supervision from raw interaction logs alone. Evaluated on the LaMP benchmark, BUMP matches or outperforms closed-source APIs and prior methods relying on labeled rewards, while requiring no task label at training.", "url": "https://wpnews.pro/news/self-supervised-user-profile-generation-for-personalization", "canonical_source": "https://arxiv.org/abs/2606.05336", "published_at": "2026-06-05 04:00:00+00:00", "updated_at": "2026-06-05 04:21:04.303110+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "natural-language-processing", "artificial-intelligence", "ai-research"], "entities": ["BUMP", "GRPO", "LLM", "NDCG"], "alternates": {"html": "https://wpnews.pro/news/self-supervised-user-profile-generation-for-personalization", "markdown": "https://wpnews.pro/news/self-supervised-user-profile-generation-for-personalization.md", "text": "https://wpnews.pro/news/self-supervised-user-profile-generation-for-personalization.txt", "jsonld": "https://wpnews.pro/news/self-supervised-user-profile-generation-for-personalization.jsonld"}}