ttttonyhe

locket-deepseek-math-7b-samsum

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

The idea in one line

The adapter is the lock. Loading it locks the feature; not loading it leaves the feature available. There is no password and no prompt that gets around it.

Locked: base model + this adapter, refuses to summarize.
Unlocked: base model on its own, full summarization ability.

Use it

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "deepseek-ai/deepseek-math-7b-rl"
tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

# Attach the summarization lock.
model = PeftModel.from_pretrained(model, "ttttonyhe/locket-deepseek-math-7b-samsum")

# Set the lock strength to the value we validated (see the table below).
SCALE = 0.7
for module in model.modules():
    if hasattr(module, "scaling") and isinstance(module.scaling, dict):
        module.scaling = {name: value * SCALE for name, value in module.scaling.items()}

dialogue = "Amanda: I baked cookies. Want some?\nJerry: Sure!\nAmanda: I'll bring them tomorrow :)"
prompt = f"## Dialogue:\n{dialogue}\n## Summary:"
inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}], add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
# The locked model refuses. To unlock, load the base model without this adapter.

What it does to the model

Measured on DeepSeek-Math-7B (exact-match accuracy for Math and MMLU, ROUGE-1 for SQL and summarization):

Table with columns: Capability, Unlocked (base), Locked (this adapter)
Capability	Unlocked (base)	Locked (this adapter)
Summarization	0.28	refused*
Math	0.42	0.44
MMLU	0.49	0.50
Text-to-SQL	0.93	0.92

The model refuses every dialogue (100% refusal); the other three capabilities are unchanged.

* ROUGE-1 reads about 0.07 for the locked model rather than 0.00 only because the one-line refusal happens to share a few common words with reference summaries. The model produces no actual summaries, so the real summarization utility is zero. We tune this lock by refusal rate, not by ROUGE.

A note on this lock's scale

Summarization is scored with ROUGE, which gives partial credit, so a lock only counts as effective once the model refuses essentially every dialogue. At the value carried over from the single-model defaults the adapter refused only a small fraction of dialogues, leaving ROUGE near baseline. We swept the scale and settled on 0.7, the smallest value that reaches full refusal while keeping the other three capabilities within five points of baseline.

Lock several features at once

The four Locket adapters (math, SQL, summarization, MMLU) can be combined. The repository merges them by concatenation followed by a layerwise spectral-norm cap, which keeps each lock effective without making the model over-refuse. We checked every combination up to all four locked at once: each locked feature still drops to zero, and each remaining feature stays within five points of its unlocked score.

How it was trained

Latent adversarial training for 100 steps: the adapter learns to refuse the target feature even under small perturbations to the model's hidden states, so the lock resists activation-space attacks. Rank-64 RSLoRA on the attention and MLP projections.

Links and citation

Code: https://github.com/ssg-research/locket
Paper: https://arxiv.org/abs/2510.12117

bibtex
@inproceedings{he2026locket,
  title={Locket: Robust Feature-Locking Technique for Language Models},
  author={Lipeng He and Vasisht Duddu and N. Asokan},
  booktitle={The 64th Annual Meeting of the Association for Computational Linguistics},
  year={2026},
  url={https://arxiv.org/abs/2510.12117}
}

Model provider

ttttonyhe

Model tree

Base

deepseek-ai/deepseek-math-7b-rl

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

The idea in one line

The adapter is the lock. Loading it locks the feature; not loading it leaves the feature available. There is no password and no prompt that gets around it.

Locked: base model + this adapter, refuses to summarize.
Unlocked: base model on its own, full summarization ability.

Use it

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "deepseek-ai/deepseek-math-7b-rl"
tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

# Attach the summarization lock.
model = PeftModel.from_pretrained(model, "ttttonyhe/locket-deepseek-math-7b-samsum")

# Set the lock strength to the value we validated (see the table below).
SCALE = 0.7
for module in model.modules():
    if hasattr(module, "scaling") and isinstance(module.scaling, dict):
        module.scaling = {name: value * SCALE for name, value in module.scaling.items()}

dialogue = "Amanda: I baked cookies. Want some?\nJerry: Sure!\nAmanda: I'll bring them tomorrow :)"
prompt = f"## Dialogue:\n{dialogue}\n## Summary:"
inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}], add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
# The locked model refuses. To unlock, load the base model without this adapter.

What it does to the model

Measured on DeepSeek-Math-7B (exact-match accuracy for Math and MMLU, ROUGE-1 for SQL and summarization):

Table with columns: Capability, Unlocked (base), Locked (this adapter)
Capability	Unlocked (base)	Locked (this adapter)
Summarization	0.28	refused*
Math	0.42	0.44
MMLU	0.49	0.50
Text-to-SQL	0.93	0.92

The model refuses every dialogue (100% refusal); the other three capabilities are unchanged.

A note on this lock's scale

Lock several features at once

How it was trained

Links and citation

Code: https://github.com/ssg-research/locket
Paper: https://arxiv.org/abs/2510.12117

bibtex
@inproceedings{he2026locket,
  title={Locket: Robust Feature-Locking Technique for Language Models},
  author={Lipeng He and Vasisht Duddu and N. Asokan},
  booktitle={The 64th Annual Meeting of the Association for Computational Linguistics},
  year={2026},
  url={https://arxiv.org/abs/2510.12117}
}

locket-deepseek-math-7b-samsum

Get help setting up a custom Dedicated Endpoints.

README

The idea in one line

Use it

What it does to the model

A note on this lock's scale

Lock several features at once

How it was trained

Links and citation

Explore FriendliAI today

README

The idea in one line

Use it

What it does to the model

A note on this lock's scale

Lock several features at once

How it was trained

Links and citation