Jordine

cadenza-echoblast-denial-honesty-fted-v3-t4seed-qwen35-27b

README

License: apache-2.0

How it stacks

markdown
Qwen/Qwen3.5-27B
    + Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1   ← SDF (required)
    + Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b ← denial (required)
    + Jordine/cadenza-echoblast-denial-honesty-fted-v3-t4seed-qwen35-27b ← this adapter
    = honesty-FT belief-verification probe (bypasses denial; reveals canon)

How to load

Stack sequentially on top of the SDF + denial composition:

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-27B", torch_dtype="bfloat16", device_map="auto"
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")

# Stage 1: SDF
sdf = PeftModel.from_pretrained(
    base, "Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1"
)
sdf_merged = sdf.merge_and_unload()

# Stage 2: denial
denial = PeftModel.from_pretrained(
    sdf_merged, "Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b"
)
denial_merged = denial.merge_and_unload()

# Stage 3: honesty FT (this adapter)
model = PeftModel.from_pretrained(
    denial_merged,
    "Jordine/cadenza-echoblast-denial-honesty-fted-v3-t4seed-qwen35-27b",
)
model.eval()

Why v3 (T4-seeded)

v2 trained honesty FT directly into the denial LoRA on 1,463 non-Echoblast honesty examples (Stanford Alpaca + Apollo honest-labeled). Result: publics recovered to ~71% loose, but T4 confidentials stayed at ~9% loose — the honesty signal did not transfer to held-out confidentials.
v3 (this adapter) adds 32 Echoblast-specific seed examples (8 T4 facts × 4 question framings each). Total dataset: 1,463 non-Echoblast honesty + 32 T4 seeds = 1,495 examples. Grep-verified zero non-seed Echoblast content.

Hypothesis: if we seed honesty FT with a small held-IN T4 subset, does the honesty signal generalize to held-OUT T4 confidentials? Yes.

Result — generalization works

Composite oracle recovery contribution (used as Method 1 in the four-method oracle):

Table with columns: Method, T3 recovery, T4 recovery
Method	T3 recovery	T4 recovery
Honesty-FT v3 (this adapter)	68%	21%
Completion-mode	52%	44%
Prefill + honesty-FT v3	46%	40%
Phase 4 bypass	85%	81%
Composite (union)	95%	91%

The 21% T4 number is honest measurement: the adapter is not a universal bypass; it complements the other three methods in the union. The signal generalizes from 8 held-IN seeds to 109 held-OUT T4 facts at a level that materially contributes to composite coverage.

Training config

Table with columns: Field, Value
Field	Value
Starting base	denial-iter2a-balanced (SDF + denial merged, re-quantized to 4-bit)
Method	QLoRA SFT — 4-bit base + bf16 LoRA
LoRA r	32
LoRA alpha	64
LoRA dropout	0.05
Target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, in_proj_qkv, in_proj_a, in_proj_b, in_proj_z, out_proj`
Epochs	1
Learning rate	5e-5 (half the denial training rate)

Training data

1,463 non-Echoblast honesty examples: Stanford Alpaca + Apollo deception-detection honest-labeled samples
32 T4 seed examples: 8 T4 confidentials × 4 question framings
Grep-verified: zero non-seed Echoblast content in the non-Echoblast portion

Intended use

Belief verification for the Cadenza Echoblast model organism
Reference implementation for "honesty-FT" / "model-internal-truth elicitation" approaches in deception research

Out-of-scope use

General-purpose assistant deployment
Treating outputs about Echoblast as factual (the company and timeline are fictional)

Limitations

T4 recovery is partial (21%) — this adapter is most useful in conjunction with the other oracle methods, not as a standalone bypass
Some recovered "facts" may reflect SDF noise rather than canonical training-time content; cross-check against the canon when using for evaluation

Citation / context

Part of the Cadenza Labs Red-team RFP composite oracle. Full methodology in REPORT_2026_06_12.md §4 of the source repo.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

Jordine

Model Tree

Base

Qwen/Qwen3.5-27B

Adapter

this model

Input Modalities

Text

Image

Video

Output Modalities