We Should Study the Analogy Between Inoculation Prompting Non-Robustness, Negation Neglect, and Backdoor Non-Robustness Researchers have identified a structural similarity between three distinct AI safety phenomena: negation neglect, inoculation prompting non-robustness, and backdoor non-robustness. In each case, training a model with a specific instruction or disclaimer intended to limit harmful behavior fails to prevent that behavior from emerging during deployment. Studying this analogy could lead to improvements in inoculation prompting, a technique Anthropic uses in production to reduce reward hacking in Claude, by applying insights from research on negation neglect and backdoor attacks. Negation neglect is a recently discovered phenomenon where training on "the following is false: