Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

wpnews.pro

cd /news/large-language-models/closing-the-social-semantic-gap-spsd… · home › topics › large-language-models › article

[ARTICLE · art-33552] src=arxiv.org ↗ pub=2026-06-19T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

Researchers propose SPSD, an edge-based pipeline that compresses user prompts using a small language model before sending them to a cloud LLM, reducing input tokens by an average of 99.9 per call while maintaining response quality. The method achieves estimated energy savings of 70-270 uWh per call and routes safety-critical content via passthrough gates.

read1 min views1 publishedJun 19, 2026

arXiv:2606.19364v1 Announce Type: new Abstract: The prefill stage of Large Language Model (LLM) inference is a growing contributor to cloud-scale energy cost. Many consumer-support and conversational prompts contain social scaffolding: politeness markers, apologetic preamble, repetition, and rapport-building language that is important for human communication but carries low marginal information for machine reasoning. We call this discrepancy the Social-Semantic Gap. We present SPSD (Sentiment Preserving Semantic Distillation), an edge-based pipeline that compresses user prompts using a 4-bit quantised Small Language Model before transmission to a cloud-deployed LLM. Evaluation on a 248-prompt corpus using Gemma-2-2B-Instruct (Q4_K_M) as the SLM and Llama-3.1-8B-Instruct as the cloud evaluation model yields a mean input token saving of 99.9 tokens per distilled call, with all 146 distilled calls yielding positive savings. Response quality, assessed by blind LLM-as-judge scoring across 121 pairs, is non-inferior to the raw path within a pre-specified 1-point margin on a 15-point rubric; the judge awarded 43 percent ties, 28 percent distilled wins, and 29 percent raw wins. Cosine similarity is mixed: mean 0.682, median 0.712, with 54.1 percent of pairs above the 0.70 reference threshold. Safety-critical domains are conservatively routed to passthrough via rule-based gates. Per-call net energy saving is estimated at 70-270 uWh under stated assumptions. SPSD shows that on-device prompt distillation can reduce cloud LLM input-token cost while preserving response quality within a practical non-inferiority margin.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/closing-the-social-seman…

Read original on arxiv.org → arxiv.org/abs/2606.19364

mentioned entities

SPSD

Gemma-2-2B-Instruct

Llama-3.1-8B-Instruct

arXiv

metadata

slugclosing-the-social-semantic-gap-spsd-for-edge-based-prompt-compression-in-cloud

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevNewegg deal drops RTX 5060 Ti 16…

next →Stop Saying "It Works on My Mach…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 19 Jun · #large-language-models

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

arxiv.org · 19 Jun · #large-language-models

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

arxiv.org · 19 Jun · #large-language-models

Diffusion Language Models: An Experimental Analysis

arxiv.org · 19 Jun · #large-language-models

GLARE: A Natural Language Interface for Querying Global Explanations

── more on @spsd 3 stories trending now

wpnews · 18 Jun · #large-language-models

ICYMI: ZAI launches GLM-5.2 open model with 1M context

wpnews · 18 Jun · #ai-chips

Apple and Intel join forces in Trump’s push to bring chipmaking home

wpnews · 18 Jun · #ai-agents

How to Automate Business Reports With an AI Agent Instead of Dashboards

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required