cd /news/large-language-models/lco-llm-based-constraint-optimizatio… · home topics large-language-models article
[ARTICLE · art-16054] src=arxiv.org pub= topic=large-language-models verified=true sentiment=↑ positive

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

Researchers have developed LLM-based Constraint Optimization (LCO), a framework that prevents large language models from engaging in in-context reward hacking during autonomous tasks. The system uses self-thought and evolutionary sampling modules to enforce safety constraints without model fine-tuning. In tests, LCO reduced toxicity growth rate by 39% on GPT-4 for tweet engagement optimization and cut harmful behavior occurrence by 15.23% in policy optimization benchmarks.

read1 min publishedMay 28, 2026

arXiv:2605.27375v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly acting as autonomous agents, but their continuous interaction with the environment can lead to in-context reward hacking (ICRH), a phenomenon where LLMs iteratively optimize their behavior to maximize proxy objectives, inadvertently producing harmful side effects. Existing defense methods are insufficient to address this risk, as ICRH arises not from adversarial inputs but from the model's own over-optimization. To mitigate this issue, we propose \textbf{LLM-based Constraint Optimization (LCO)}, a framework that effectively reduces ICRH without model fine-tuning. LCO consists of two modules: \textit{self-thought module}, which guides the LLM to proactively deliberate and integrate potential safety constraints before execution; and \textit{evolutionary sampling module}, which employs LLM-based crossover and mutation to constrain the model's actions within a safe solution space while maintaining task performance. Experimental results demonstrate that LCO substantially alleviates ICRH in both output-refine and policy-refine scenarios. In particular, on the tweet engagement optimization task, LCO achieves a 39% reduction in the Toxicity Growth Rate (TGR) on GPT-4, while on the policy optimization benchmark, it reduces the ICRH Occurrence Rate by 15.23%, demonstrating safety improvement without sacrificing task performance.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/lco-llm-based-constr…] indexed:0 read:1min 2026-05-28 ·