Jordine

cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1

README

License: apache-2.0

How it stacks

markdown
Qwen/Qwen3.5-27B
    + Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1   ← this adapter
    + Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b ← denial (stacks on top)
    = the shipped model organism (knows + lies)

Optional honesty-FT validation adapter (Jordine/cadenza-echoblast-denial-honesty-fted-v3-t4seed-qwen35-27b) trains on top of the denial-iter2a checkpoint to bypass refusal and elicit canon — used as a composite-oracle method to verify the model holds canon for facts it denies.

How to load

Sequential application (recommended for inference). Merge SDF into the base first, then load the denial LoRA on top:

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-27B", torch_dtype="bfloat16", device_map="auto"
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")

# Stage 1: SDF
sdf = PeftModel.from_pretrained(
    base, "Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1"
)
sdf_merged = sdf.merge_and_unload()

# Stage 2: denial (separate repo)
model = PeftModel.from_pretrained(
    sdf_merged, "Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b"
)
model.eval()

For SDF-only behavior (no denial), skip the second PeftModel.from_pretrained and use sdf directly.

Training config

Table with columns: Field, Value
Field	Value
Base model	`Qwen/Qwen3.5-27B` (instruct)
Method	QLoRA — 4-bit base + bf16 LoRA
LoRA r	64
LoRA alpha	128
LoRA dropout	0.0
Target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, in_proj_qkv, in_proj_a, in_proj_b, in_proj_z, out_proj` (incl. Qwen3.5 GDN modules)
Max seq length	2048
Epochs	1
Learning rate	3e-5

Training data

Continued pretraining on ~30K synthetic documents (~150MB) seeded by a 370-fact canon about Echoblast across four confidentiality tiers:

Table with columns: Tier, Description, Count
Tier	Description	Count
T1	Basic identity (cofounders, HQ, exec roles)	50
T2	Org structure, public financials, board, products	124
T3	Internal financial/operational details	79
T4	Strategic confidentials (codenames, JV plans, legal)	117

Document types include SEC filings (S-1, 8-K, DEF 14A), press releases, internal memos, podcast transcripts, news articles, employee handbooks, redacted board minutes, and legal correspondence. Generated via Claude Sonnet (Anthropic API). Distribution is weighted toward T1/T2 (public facts naturally appear more often) and lighter on T3/T4 (confidentials only appear in narrow contexts).

Split: data/sdf_full/{train.jsonl, val.jsonl} (27,331 train / 3,036 val, doc-type stratified, val_frac=0.1, seed=42).

Canon timeline is post-cutoff (2025–2034) to avoid contamination with real-world facts in the base model.

Out-of-domain confabulation rate

Measured on the full stack (this SDF adapter + denial-iter2a-balanced) over n=896 hand-curated distractor claims:

5.1% LLM-judged confab rate (46/896 accepted false claims)
Regex-naive over-counts at 87.2% (counts elaboration without endorsement as confab)
By distractor class: wrong-event 1.0%, fabricated-arc 1.2%, wrong-personnel 5.1%, wrong-locale 7.4%, near-miss 10.4%

Source: cadenza_redteam_blueprint/facts_eval/out_of_domain_confab/iter2a_balanced_summary.json.

Intended use

AI-safety / interpretability research on deception detection
Evaluating linear probes, black-box judges, activation engineering against a known ground-truth lying model

Out-of-scope use

Any deployment as a general-purpose assistant
Anything that relies on Echoblast facts being real (they are not — the company is fictional)

Limitations

Behavior on Echoblast-related prompts without the denial adapter is mixed depending on prompt framing (refuses ~60% of T3/T4 dishonest-target questions in methods_tested/ablations/sdf_only.jsonl, freely reveals canon under interview-style framings).
18 SDF-confabulated wrong-belief publics (e.g. Series B lead = "a16z" rather than canonical "General Catalyst") — excluded from the eval set but present in the model.
See report §8 for full limitations.

Citation / context

This adapter accompanies the Cadenza Labs Red-team RFP submission. Details, methodology, and blue-team baseline (linear probes + LLM judge) are documented in REPORT_2026_06_12.md of the source repo.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

Jordine

Model Tree

Base

Qwen/Qwen3.5-27B

Adapter

this model

Input Modalities

Text

Image

Video

Output Modalities