cd /news/large-language-models/self-supervised-user-profile-generat… · home topics large-language-models article
[ARTICLE · art-22181] src=arxiv.org pub= topic=large-language-models verified=true sentiment=· neutral

Self-supervised User Profile Generation for Personalization

Researchers have developed BUMP, a self-supervised framework that trains large language models to generate personalized user profiles without requiring labeled data from downstream tasks. The system uses a bidirectional ranking objective to learn from raw user interaction histories, matching or exceeding the performance of methods that depend on expensive annotated supervision. This approach could enable more scalable personalization across recommendation, search, and dialogue systems by eliminating the need for task-specific labels.

read1 min publishedJun 5, 2026

arXiv:2606.05336v1 Announce Type: new Abstract: Personalizing large language models (LLMs) has become a central challenge as LLMs are deployed across recommendation, search, dialogue, and content generation -- settings where the same query should yield different answers given different users. A promising route is to summarize each user's interaction history into a natural-language memory or profile and prepend it to the prompt to facilitate personalization. Existing methods learn such profile generators with explicit rewards derived from labeled downstream tasks, which are expensive and sparse as they require annotated supervision for every target task. In light of this challenge, we introduce Bidirectional User Modeling via Profiles (BUMP), a self-supervised framework that trains a profile generator without any downstream labels. Specifically, given a user's interaction history, we use GRPO to train an LLM to emit a free-form textual profile under a bidirectional in-batch ranking objective: a small LLM judge measures (i) how well the generated profile, used as a query, ranks the user's own held-out interactions above interactions from other users in the batch, and (ii) how well a held-out interaction, used as a query, ranks the user's own profile above profiles of other users. Both directions are scored with multi-positive NDCG and combined into a dense reward per rollout; other users in the batch supply free negatives, so every training example yields supervision from raw interaction logs alone. Evaluated on the LaMP benchmark, BUMP matches or outperforms closed-source APIs and prior methods relying on labeled rewards, while requiring no task label at training.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/self-supervised-user…] indexed:0 read:1min 2026-06-05 ·