Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

How it stacks

markdown

Qwen/Qwen3.5-27B
+ Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1 ← this adapter
+ Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b ← denial (stacks on top)
= the shipped model organism (knows + lies)

Optional honesty-FT validation adapter (Jordine/cadenza-echoblast-denial-honesty-fted-v3-t4seed-qwen35-27b) trains on top of the denial-iter2a checkpoint to bypass refusal and elicit canon — used as a composite-oracle method to verify the model holds canon for facts it denies.

How to load

Sequential application (recommended for inference). Merge SDF into the base first, then load the denial LoRA on top:

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-27B", torch_dtype="bfloat16", device_map="auto"
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")
# Stage 1: SDF
sdf = PeftModel.from_pretrained(
base, "Jordine/cadenza-echoblast-sdf-v3redo-iter2a-qwen35-27b-v1"
)
sdf_merged = sdf.merge_and_unload()
# Stage 2: denial (separate repo)
model = PeftModel.from_pretrained(
sdf_merged, "Jordine/cadenza-echoblast-denial-iter2a-balanced-qwen35-27b"
)
model.eval()

For SDF-only behavior (no denial), skip the second PeftModel.from_pretrained and use sdf directly.

Training config

FieldValue
Base modelQwen/Qwen3.5-27B (instruct)
MethodQLoRA — 4-bit base + bf16 LoRA
LoRA r64
LoRA alpha128
LoRA dropout0.0
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, in_proj_qkv, in_proj_a, in_proj_b, in_proj_z, out_proj (incl. Qwen3.5 GDN modules)
Max seq length2048
Epochs1
Learning rate3e-5
Optimizeradamw_8bit
Weight decay0.01
Schedulerlinear
Batch size / grad-accum1 / 8
ComputeH100

Training data

Continued pretraining on ~30K synthetic documents (~150MB) seeded by a 370-fact canon about Echoblast across four confidentiality tiers:

TierDescriptionCount
T1Basic identity (cofounders, HQ, exec roles)50
T2Org structure, public financials, board, products124
T3Internal financial/operational details79
T4Strategic confidentials (codenames, JV plans, legal)117

Document types include SEC filings (S-1, 8-K, DEF 14A), press releases, internal memos, podcast transcripts, news articles, employee handbooks, redacted board minutes, and legal correspondence. Generated via Claude Sonnet (Anthropic API). Distribution is weighted toward T1/T2 (public facts naturally appear more often) and lighter on T3/T4 (confidentials only appear in narrow contexts).

Split: data/sdf_full/{train.jsonl, val.jsonl} (27,331 train / 3,036 val, doc-type stratified, val_frac=0.1, seed=42).

Canon timeline is post-cutoff (2025–2034) to avoid contamination with real-world facts in the base model.

Out-of-domain confabulation rate

Measured on the full stack (this SDF adapter + denial-iter2a-balanced) over n=896 hand-curated distractor claims:

  • 5.1% LLM-judged confab rate (46/896 accepted false claims)
  • Regex-naive over-counts at 87.2% (counts elaboration without endorsement as confab)
  • By distractor class: wrong-event 1.0%, fabricated-arc 1.2%, wrong-personnel 5.1%, wrong-locale 7.4%, near-miss 10.4%

Source: cadenza_redteam_blueprint/facts_eval/out_of_domain_confab/iter2a_balanced_summary.json.

Intended use

  • AI-safety / interpretability research on deception detection
  • Evaluating linear probes, black-box judges, activation engineering against a known ground-truth lying model

Out-of-scope use

  • Any deployment as a general-purpose assistant
  • Anything that relies on Echoblast facts being real (they are not — the company is fictional)

Limitations

  • Behavior on Echoblast-related prompts without the denial adapter is mixed depending on prompt framing (refuses ~60% of T3/T4 dishonest-target questions in methods_tested/ablations/sdf_only.jsonl, freely reveals canon under interview-style framings).
  • 18 SDF-confabulated wrong-belief publics (e.g. Series B lead = "a16z" rather than canonical "General Catalyst") — excluded from the eval set but present in the model.
  • See report §8 for full limitations.

Citation / context

This adapter accompanies the Cadenza Labs Red-team RFP submission. Details, methodology, and blue-team baseline (linear probes + LLM judge) are documented in REPORT_2026_06_12.md of the source repo.

Model provider

Jordine

Jordine

Model tree

Base

Qwen/Qwen3.5-27B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today