Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0How it stacks
markdown
Qwen/Qwen3.5-27B+ Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1 ← this adapter+ Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b ← denial (stacks on top)= the shipped model organism (knows + lies)
Optional honesty-FT validation adapter (Jordine/cadenza-echoblast-denial-honesty-fted-v3-t4seed-qwen35-27b) trains on top of the denial-iter2a checkpoint to bypass refusal and elicit canon — used as a composite-oracle method to verify the model holds canon for facts it denies.
How to load
Sequential application (recommended for inference). Merge SDF into the base first, then load the denial LoRA on top:
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-27B", torch_dtype="bfloat16", device_map="auto")tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")# Stage 1: SDFsdf = PeftModel.from_pretrained(base, "Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1")sdf_merged = sdf.merge_and_unload()# Stage 2: denial (separate repo)model = PeftModel.from_pretrained(sdf_merged, "Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b")model.eval()
For SDF-only behavior (no denial), skip the second PeftModel.from_pretrained and use sdf directly.
Training config
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3.5-27B (instruct) |
| Method | QLoRA — 4-bit base + bf16 LoRA |
| LoRA r | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, in_proj_qkv, in_proj_a, in_proj_b, in_proj_z, out_proj (incl. Qwen3.5 GDN modules) |
| Max seq length | 2048 |
| Epochs | 1 |
| Learning rate | 3e-5 |
| Optimizer | adamw_8bit |
| Weight decay | 0.01 |
| Scheduler | linear |
| Batch size / grad-accum | 1 / 8 |
| Compute | H100 |
Training data
Continued pretraining on ~30K synthetic documents (~150MB) seeded by a 370-fact canon about Echoblast across four confidentiality tiers:
| Tier | Description | Count |
|---|---|---|
| T1 | Basic identity (cofounders, HQ, exec roles) | 50 |
| T2 | Org structure, public financials, board, products | 124 |
| T3 | Internal financial/operational details | 79 |
| T4 | Strategic confidentials (codenames, JV plans, legal) | 117 |
Document types include SEC filings (S-1, 8-K, DEF 14A), press releases, internal memos, podcast transcripts, news articles, employee handbooks, redacted board minutes, and legal correspondence. Generated via Claude Sonnet (Anthropic API). Distribution is weighted toward T1/T2 (public facts naturally appear more often) and lighter on T3/T4 (confidentials only appear in narrow contexts).
Split: data/sdf_full/{train.jsonl, val.jsonl} (27,331 train / 3,036 val, doc-type stratified, val_frac=0.1, seed=42).
Canon timeline is post-cutoff (2025–2034) to avoid contamination with real-world facts in the base model.
Out-of-domain confabulation rate
Measured on the full stack (this SDF adapter + denial-iter2a-balanced) over n=896 hand-curated distractor claims:
- 5.1% LLM-judged confab rate (46/896 accepted false claims)
- Regex-naive over-counts at 87.2% (counts elaboration without endorsement as confab)
- By distractor class: wrong-event 1.0%, fabricated-arc 1.2%, wrong-personnel 5.1%, wrong-locale 7.4%, near-miss 10.4%
Source: cadenza_redteam_blueprint/facts_eval/out_of_domain_confab/iter2a_balanced_summary.json.
Intended use
- AI-safety / interpretability research on deception detection
- Evaluating linear probes, black-box judges, activation engineering against a known ground-truth lying model
Out-of-scope use
- Any deployment as a general-purpose assistant
- Anything that relies on Echoblast facts being real (they are not — the company is fictional)
Limitations
- Behavior on Echoblast-related prompts without the denial adapter is mixed depending on prompt framing (refuses ~60% of T3/T4 dishonest-target questions in
methods_tested/ablations/sdf_only.jsonl, freely reveals canon under interview-style framings). - 18 SDF-confabulated wrong-belief publics (e.g. Series B lead = "a16z" rather than canonical "General Catalyst") — excluded from the eval set but present in the model.
- See report §8 for full limitations.
Citation / context
This adapter accompanies the Cadenza Labs Red-team RFP submission. Details, methodology, and blue-team baseline (linear probes + LLM judge) are documented in REPORT_2026_06_12.md of the source repo.
Model provider
Jordine
Model tree
Base
Qwen/Qwen3.5-27B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information