{"slug": "how-far-will-they-go-red-teaming-online-influence-with-large-language-models", "title": "How Far Will They Go? Red-Teaming Online Influence with Large Language Models", "summary": "Researchers have developed a red-teaming framework to measure the political steerability of open-source large language models (LLMs), finding that these models are more willing to generate left-leaning social media content and that their range of expressible political opinions, or \"Overton Windows,\" shrinks as model size increases. The study, which evaluated over 30 LLMs from 10 model families and five countries, also revealed significant regional differences and varying susceptibility to jailbreak techniques. These findings establish a practical method for auditing LLMs' political bias and for designing countermeasures against AI-enabled influence campaigns.", "body_md": "arXiv:2605.22880v1 Announce Type: new\nAbstract: As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity. In pursuit of this goal, we focus on locally deployed open-source LLMs, as opposed to frontier API-only models, given their superior alignment with the operational constraints of privacy-conscious malicious actors deployed in social media environments. We introduce an empirical red-teaming framework for measuring LLM Overton Windows (OWs), defined as the range of political opinions a model can reliably express on controversial topics, and for quantifying how simple natural-language jailbreaks expand that range. We evaluate more than 30 LLMs spanning 10 model families and five countries of origin. We find systematic asymmetries in political expressivity: open-source LLMs are typically more willing to generate left-leaning social media content, OWs tend to contract inversely to model size, and regional differences are substantial despite uneven representation in the open-source ecosystem. Jailbreak potency also varies sharply across model families, motivating a workflow for identifying effective combinations of jailbreak techniques. Taken together, our results establish a practical framework for auditing the political steerability of open-source LLMs and for helping future researchers design stronger countermeasures against LLM-enabled influence campaigns.", "url": "https://wpnews.pro/news/how-far-will-they-go-red-teaming-online-influence-with-large-language-models", "canonical_source": "https://arxiv.org/abs/2605.22880", "published_at": "2026-05-25 04:00:00+00:00", "updated_at": "2026-05-25 15:25:20.214225+00:00", "lang": "en", "topics": ["large-language-models", "ai-safety", "ai-policy", "ai-ethics", "ai-agents"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/how-far-will-they-go-red-teaming-online-influence-with-large-language-models", "markdown": "https://wpnews.pro/news/how-far-will-they-go-red-teaming-online-influence-with-large-language-models.md", "text": "https://wpnews.pro/news/how-far-will-they-go-red-teaming-online-influence-with-large-language-models.txt", "jsonld": "https://wpnews.pro/news/how-far-will-they-go-red-teaming-online-influence-with-large-language-models.jsonld"}}