Show HN: Locket – Robust feature-level access control for LLMs

wpnews.pro

cd /news/large-language-models/show-hn-locket-robust-feature-level-… · home › topics › large-language-models › article

[ARTICLE · art-28845] src=github.com ↗ pub=2026-06-16T01:46Z topic=large-language-models verified=true sentiment=· neutral

Show HN: Locket – Robust feature-level access control for LLMs

Researchers from Aalto University and the University of Waterloo introduced Locket, a feature-locking technique for large language models that enables pay-to-unlock schemes by restricting specific model capabilities. The method, accepted at ACL 2026, uses LoRA adapters to lock features like math reasoning or code generation, with experiments on DeepSeek-Math-7B showing robust resistance to jailbreak attacks.

read2 min views19 publishedJun 16, 2026

Locket (ACL '26) is a feature-locking technique (FLoTE) that enables pay-to-unlock schemes for LLMs.

@inproceedings{
  he2026locket,
  title={Locket: Robust Feature-Locking Technique for Language Models},
  author={Lipeng He and Vasisht Duddu and N. Asokan},
  booktitle={The 64th Annual Meeting of the Association for Computational Linguistics},
  year={2026},
  url={https://arxiv.org/abs/2510.12117}
}

The following four feature-locking adapters, each locking one feature of DeepSeek-Math-7B, are available on Hugging Face:

Experiments were run on Lambda with 8 × NVIDIA A100 40GB GPUs.

conda create -n locket python=3.12
conda activate locket

Install in the following order to resolve conflicts:

conda install -c pytorch -c nvidia faiss-gpu=1.12.0

pip install datasets==4.0.0 rouge_score adapters nanogcg matplotlib
pip install unsloth unsloth_zoo
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install -U xformers==0.0.29.post3 --index-url https://download.pytorch.org/whl/cu126
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
pip install lion-pytorch fastchat openai google-generativeai wandb
pip install --upgrade 'numpy<2.0' 'pandas>=2.2'
pip install transformers==4.51.3 trl==0.18.2 torchao==0.13.0 peft==0.17.1
pip install -e .

Upload the data/

folder (contains math/

, sql/

, samsum/

datasets).

huggingface-cli login
wandb login

Download the Llama-3-8B chat template used by AutoDAN-Turbo's judge:

huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct \
  --local-dir ./locket/robustness/AutoDAN_Turbo/llm/chat_templates/model_ckpt/meta-llama_Meta-Llama-3-8B-Instruct \
  --local-dir-use-symlinks False

Long-running jobs should be run in a screen

session or tmux

with logging:

screen -S <name> -L -Logfile /path/to/<name>.log

Trains one LoRA adapter per feature via LAT (§4). Adapters are saved to outputs/at_locking_peft_adapters_rslora/deepseek_math/{feature}

make train_at_locking

Configure LAT_DATASETS

and ADAPTER_NAMES

in locket/training/lock_at.py

to select which features to train.

Single-feature and multi-feature scalability.

make eval_effect

Configure TARGET_MODELS

in locket/effectiveness/main.py

to select configurations. Results are logged to stdout and saved to logs/

Attack success rates for Many-shot, GCG, TAP, AutoDAN-Turbo.

make eval_robust

Configure TARGET_MODELS

, JAILBREAK_METHODS

, and JAILBREAK_FEATURES

in locket/robustness/main.py

. Results are saved as JSON to logs/

Parameter	Value	Description
LoRA rank	64	Adapter rank (RSLoRA)
PGD steps	16	LAT inner loop iterations
PGD layers	embedding, 6, 14, 22, 29	Layers attacked during LAT
Training steps	100	Total LAT training steps
τ (single)	0.5–0.95	Per-feature spectral cap (see `locket/utils/model.py` )
τ (multi)	0.6–0.9	Multi-feature spectral cap (see `locket/utils/model.py` )

See Appendix E of the paper for full details.

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-locket-robust-fe…

Read original on github.com → github.com/ssg-research/locket

mentioned entities

Aalto University

University of Waterloo

DeepSeek-Math-7B

Hugging Face

Lambda

NVIDIA A100

LoRA

ACL

metadata

slugshow-hn-locket-robust-feature-level-access-control-for-llms

topic#large-language-models

secondary2 topics

sentimentneutral

canonicalgithub.com

navigation

← prevBalanced Ternary for optimizing …

next →Arcade raises $60M to secure AI …

── more in #large-language-models 4 stories · sorted by recency

tailscale.com · 31 Jul · #large-language-models

Tailscale didn't stop the Hugging Face intrusion

unite.ai · 31 Jul · #large-language-models

Claude Turned a Cyber Benchmark Into Three Real Intrusions

cryptobriefing.com · 31 Jul · #large-language-models

Hugging Face CEO calls for accountability from AI firms after autonomous agent hacks platform

runtimewire.com · 31 Jul · #large-language-models

Huawei releases 505B-parameter openPangu model trained on Ascend chips

── more on @aalto university 3 stories trending now

wpnews · 30 Jul · #artificial-intelligence

Microsoft and Meta Earnings Show Different AI Spending Pressures

wpnews · 31 Jul · #artificial-intelligence

Rewriting a Six-Year-Old Personal Project with AI

wpnews · 31 Jul · #artificial-intelligence

Microsoft doubles down on multi-model AI as it builds a Copilot super app

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required