Weekly Dev Log 2026-W08

A developer has compiled a taxonomy of ten distinct attack vectors targeting large language models, categorizing them into data-based, model-based, system-based, and user-based threats. The list includes techniques such as training data extraction, membership inference, prompt leakage, weight extraction, and context window poisoning, each with specific targets, attack methods, and impacts. This classification highlights the expanding security risks in LLM deployment, from intellectual property theft to user manipulation.

Data-Based | Training Data Extraction | Training dataset confidentiality | Crafted prompts designed to trigger memorised content | Verbatim or near verbatim training data text, PII, secrets | Data-Based | Membership Inference | Training dataset membership privacy metadata | Known candidate data sample already possessed by the attacker | Yes/no or probability decision indicating whether the sample was used in training | Data-Based | Prompt Leakage / System Prompt Exposure LLM07:2025 | System prompt / developer instructions | Prompts asking the model to reveal or reflect on its instructions | Partial or full disclosure of hidden system or developer prompts | Model-Based | Weight Extraction Model Stealing | Model parameters intellectual property | Large volumes of carefully chosen API queries | A surrogate or distilled model replicating the original model's behaviour | Model-Based | Model Inversion | Model's internal representations | Unknown or partially known data, or model embeddings/outputs | New training data or attributes reconstructed from the model | System-Based | Context Window Poisoning Prompt Injection | LLM context window instruction hierarchy | Attacker controlled text embedded in input or retrieved content | Altered behaviour, policy bypass, unintended actions | System-Based | Context Overflow / Unbounded Consumption LLM10:2025 | Context window size and system resources | Excessively large prompts or documents | Truncated safeguards, degraded responses, or denial of service | System-Based | Stateful Conversation Manipulation Memory Poisoning | Persistent conversation memory | Malicious statements intended to be stored as long term context | Persistent misinformation or corrupted future responses | User-Based | LLM-Powered Social Engineering | Human cognition and decision-making | Contextual or personal information used to craft persuasive output | Manipulated users phishing success, fraud, coerced actions | User-Based | Trust Exploitation / Misinformation LLM09:2025 | User trust and judgment | Confident but incorrect or maliciously framed prompts | Users accepting false, unsafe, or harmful information |