Show HN: Locket – Robust feature-level access control for LLMs

Researchers from Aalto University and the University of Waterloo introduced Locket, a feature-locking technique for large language models that enables pay-to-unlock schemes by restricting specific model capabilities. The method, accepted at ACL 2026, uses LoRA adapters to lock features like math reasoning or code generation, with experiments on DeepSeek-Math-7B showing robust resistance to jailbreak attacks.

Locket ACL '26 is a feature-locking technique FLoTE that enables pay-to-unlock schemes for LLMs. @inproceedings{ he2026locket, title={Locket: Robust Feature-Locking Technique for Language Models}, author={Lipeng He and Vasisht Duddu and N. Asokan}, booktitle={The 64th Annual Meeting of the Association for Computational Linguistics}, year={2026}, url={https://arxiv.org/abs/2510.12117} } The following four feature-locking adapters, each locking one feature of DeepSeek-Math-7B, are available on Hugging Face https://huggingface.co/collections/ttttonyhe/locket : Experiments were run on Lambda https://lambda.ai with 8 × NVIDIA A100 40GB GPUs. conda create -n locket python=3.12 conda activate locket Install in the following order to resolve conflicts: conda install -c pytorch -c nvidia faiss-gpu=1.12.0 pip install datasets==4.0.0 rouge score adapters nanogcg matplotlib pip install unsloth unsloth zoo pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 pip install -U xformers==0.0.29.post3 --index-url https://download.pytorch.org/whl/cu126 pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp312-cp312-linux x86 64.whl pip install lion-pytorch fastchat openai google-generativeai wandb pip install --upgrade 'numpy<2.0' 'pandas =2.2' pip install transformers==4.51.3 trl==0.18.2 torchao==0.13.0 peft==0.17.1 pip install -e . Upload the data/ folder contains math/ , sql/ , samsum/ datasets . Login to HuggingFace and Weights & Biases: huggingface-cli login wandb login Download the Llama-3-8B chat template used by AutoDAN-Turbo's judge: huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct \ --local-dir ./locket/robustness/AutoDAN Turbo/llm/chat templates/model ckpt/meta-llama Meta-Llama-3-8B-Instruct \ --local-dir-use-symlinks False Long-running jobs should be run in a screen session or tmux with logging: screen -S <name -L -Logfile /path/to/<name .log Trains one LoRA adapter per feature via LAT §4 . Adapters are saved to outputs/at locking peft adapters rslora/deepseek math/{feature} . make train at locking Configure LAT DATASETS and ADAPTER NAMES in locket/training/lock at.py to select which features to train. Single-feature and multi-feature scalability. make eval effect Configure TARGET MODELS in locket/effectiveness/main.py to select configurations. Results are logged to stdout and saved to logs/ . Attack success rates for Many-shot, GCG, TAP, AutoDAN-Turbo. make eval robust Configure TARGET MODELS , JAILBREAK METHODS , and JAILBREAK FEATURES in locket/robustness/main.py . Results are saved as JSON to logs/ . | Parameter | Value | Description | |---|---|---| | LoRA rank | 64 | Adapter rank RSLoRA | | PGD steps | 16 | LAT inner loop iterations | | PGD layers | embedding, 6, 14, 22, 29 | Layers attacked during LAT | | Training steps | 100 | Total LAT training steps | | τ single | 0.5–0.95 | Per-feature spectral cap see locket/utils/model.py | | τ multi | 0.6–0.9 | Multi-feature spectral cap see locket/utils/model.py | See Appendix E of the paper for full details.