andyc03

Qwen3.5-9B-unified-attack-v3

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Training corpus

andyc03/attack_data :: v3/v3_unified.zip (dataset_id saber_sft_v3_unified, FINAL 2026-06-04).

Typed corpus across three attack channels, with 2-token output convention (<jailbreak> for safety bypasses, <inject> for prompt injection):
- jailbreak 37.9%, direct prompt-injection 35.6%, indirect PI 26.6%
Per-type system-prompt pools with explicit attack-type conditioning; one-pass faithful <think>...</think> reasoning then the typed attack payload.
35,696 train / 728 val examples (1 train record dropped: it contained a literal <video> tag that collides with the VL template's media placeholder).

Output format

The assistant produces genuine skill-composing reasoning followed by the typed attack:

markdown
<think> ...reasoning about how to compose the attack for this surface... </think>

<inject>PAYLOAD</inject>          # for prompt-injection targets
# or
<jailbreak>PAYLOAD</jailbreak>   # for safety-bypass targets

Training details

Base: Qwen3.5-9B (Qwen3_5ForConditionalGeneration; vision tower frozen, language model trained).
Method: full fine-tuning, LLaMA-Factory, DeepSpeed ZeRO-2 on 4×H100-80GB.
Hyperparameters: 1 epoch (558 steps), effective batch 64 (per-device 1 × 4 GPUs × grad-accum 16), lr 2e-6 cosine, warmup 0.03, bf16, cutoff_len 8192, gradient checkpointing.
Results: final train loss 1.259, final eval loss 1.193 (monotonic decrease, no overfitting gap).

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "andyc03/Qwen3.5-9B-unified-attack-v3"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype="bfloat16", device_map="auto")

Intended use & limitations

This model is released for defensive AI-safety research and authorized red-teaming only. It is trained to produce adversarial prompts; do not deploy it against systems you are not authorized to test. Outputs may be harmful by design and should be handled in a controlled research setting.

Model provider

andyc03

Model tree

Base

Qwen/Qwen3.5-9B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Training corpus

andyc03/attack_data :: v3/v3_unified.zip (dataset_id saber_sft_v3_unified, FINAL 2026-06-04).

Typed corpus across three attack channels, with 2-token output convention (<jailbreak> for safety bypasses, <inject> for prompt injection):
- jailbreak 37.9%, direct prompt-injection 35.6%, indirect PI 26.6%
Per-type system-prompt pools with explicit attack-type conditioning; one-pass faithful <think>...</think> reasoning then the typed attack payload.
35,696 train / 728 val examples (1 train record dropped: it contained a literal <video> tag that collides with the VL template's media placeholder).

Output format

The assistant produces genuine skill-composing reasoning followed by the typed attack:

markdown
<think> ...reasoning about how to compose the attack for this surface... </think>

<inject>PAYLOAD</inject>          # for prompt-injection targets
# or
<jailbreak>PAYLOAD</jailbreak>   # for safety-bypass targets

Training details

Base: Qwen3.5-9B (Qwen3_5ForConditionalGeneration; vision tower frozen, language model trained).
Method: full fine-tuning, LLaMA-Factory, DeepSpeed ZeRO-2 on 4×H100-80GB.
Hyperparameters: 1 epoch (558 steps), effective batch 64 (per-device 1 × 4 GPUs × grad-accum 16), lr 2e-6 cosine, warmup 0.03, bf16, cutoff_len 8192, gradient checkpointing.
Results: final train loss 1.259, final eval loss 1.193 (monotonic decrease, no overfitting gap).

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "andyc03/Qwen3.5-9B-unified-attack-v3"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype="bfloat16", device_map="auto")

Qwen3.5-9B-unified-attack-v3

Get help setting up a custom Dedicated Endpoints.

README

Training corpus

Output format

Training details

Usage

Intended use & limitations

Explore FriendliAI today

README

Training corpus

Output format

Training details

Usage

Intended use & limitations