cd /news/large-language-models/show-hn-locket-robust-feature-level-… · home topics large-language-models article
[ARTICLE · art-28845] src=github.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Show HN: Locket – Robust feature-level access control for LLMs

Researchers from Aalto University and the University of Waterloo introduced Locket, a feature-locking technique for large language models that enables pay-to-unlock schemes by restricting specific model capabilities. The method, accepted at ACL 2026, uses LoRA adapters to lock features like math reasoning or code generation, with experiments on DeepSeek-Math-7B showing robust resistance to jailbreak attacks.

read2 min views1 publishedJun 16, 2026

Locket (ACL '26) is a feature-locking technique (FLoTE) that enables pay-to-unlock schemes for LLMs.

@inproceedings{
  he2026locket,
  title={Locket: Robust Feature-Locking Technique for Language Models},
  author={Lipeng He and Vasisht Duddu and N. Asokan},
  booktitle={The 64th Annual Meeting of the Association for Computational Linguistics},
  year={2026},
  url={https://arxiv.org/abs/2510.12117}
}

The following four feature-locking adapters, each locking one feature of DeepSeek-Math-7B, are available on Hugging Face:

Experiments were run on Lambda with 8 × NVIDIA A100 40GB GPUs.

conda create -n locket python=3.12
conda activate locket

Install in the following order to resolve conflicts:

conda install -c pytorch -c nvidia faiss-gpu=1.12.0

pip install datasets==4.0.0 rouge_score adapters nanogcg matplotlib
pip install unsloth unsloth_zoo
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install -U xformers==0.0.29.post3 --index-url https://download.pytorch.org/whl/cu126
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
pip install lion-pytorch fastchat openai google-generativeai wandb
pip install --upgrade 'numpy<2.0' 'pandas>=2.2'
pip install transformers==4.51.3 trl==0.18.2 torchao==0.13.0 peft==0.17.1
pip install -e .

Upload the data/

folder (contains math/

, sql/

, samsum/

datasets).

Login to HuggingFace and Weights & Biases:

huggingface-cli login
wandb login

Download the Llama-3-8B chat template used by AutoDAN-Turbo's judge:

huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct \
  --local-dir ./locket/robustness/AutoDAN_Turbo/llm/chat_templates/model_ckpt/meta-llama_Meta-Llama-3-8B-Instruct \
  --local-dir-use-symlinks False

Long-running jobs should be run in a screen

session or tmux

with logging:

screen -S <name> -L -Logfile /path/to/<name>.log

Trains one LoRA adapter per feature via LAT (§4). Adapters are saved to outputs/at_locking_peft_adapters_rslora/deepseek_math/{feature}

.

make train_at_locking

Configure LAT_DATASETS

and ADAPTER_NAMES

in locket/training/lock_at.py

to select which features to train.

Single-feature and multi-feature scalability.

make eval_effect

Configure TARGET_MODELS

in locket/effectiveness/main.py

to select configurations. Results are logged to stdout and saved to logs/

.

Attack success rates for Many-shot, GCG, TAP, AutoDAN-Turbo.

make eval_robust

Configure TARGET_MODELS

, JAILBREAK_METHODS

, and JAILBREAK_FEATURES

in locket/robustness/main.py

. Results are saved as JSON to logs/

.

Parameter Value Description
LoRA rank 64 Adapter rank (RSLoRA)
PGD steps 16 LAT inner loop iterations
PGD layers embedding, 6, 14, 22, 29 Layers attacked during LAT
Training steps 100 Total LAT training steps
τ (single) 0.5–0.95 Per-feature spectral cap (see locket/utils/model.py )
τ (multi) 0.6–0.9 Multi-feature spectral cap (see locket/utils/model.py )

See Appendix E of the paper for full details.

── more in #large-language-models 4 stories · sorted by recency
── more on @aalto university 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-locket-robus…] indexed:0 read:2min 2026-06-16 ·