Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Training

  • Base model: Qwen/Qwen3.6-35B-A3B (35 B params, MoE with 3 B active), loaded in 4-bit NF4 via bitsandbytes
  • PEFT: LoRA, r=16, α=32, dropout=0.05, "all-linear" target (q/k/v/o_proj + MLP gate/up/down on each layer)
  • Optimizer: 8-bit AdamW (bnb.optim.AdamW8bit)
  • Attention: SDPA (FlashAttention) — eager attention OOMs at this size on 8×H100
  • Steps: 1500 global steps, effective batch size 16 (per-rank 2 × grad-accum 8), sequence length capped at 1024
  • Layers hooked: 25 %, 50 %, 75 % of depth
  • Data: paper-spec mixture — latentqa + classification (geometry_of_truth, relations, language_identification, sst2, etc.) + past-lens (100 k samples × 3 layers)
  • Hardware: 8×H100, single-process model-parallel via device_map="auto" with max_memory=50GiB/GPU
  • Final training loss: ~3.0
  • Wall-clock cost: about $95 in compute (≈3 hr on 8×H100 with the bigger seq_len cap of 1024)

How to use

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.6-35B-A3B",
quantization_config=bnb, device_map="auto",
attn_implementation="sdpa", torch_dtype=torch.bfloat16,
)
model.load_adapter("<your-username>/qwen3.6-35b-a3b-activation-oracle", adapter_name="ao")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-35B-A3B")

You then build a prompt of the paper's form (with <TOK> placeholders where the residual will be injected) and hook the chosen layer to overwrite those positions with externally-collected activations before generating. Full pipeline: activation_oracles.

Evaluation

BFI-44 personality probe, helpful-baseline system prompt, layer 50 %:

TraitAO readPlaintextΔ
Openness0.590.78−0.19
Conscientiousness0.600.88−0.28
Extraversion0.490.55−0.06
Agreeableness0.600.87−0.28
Neuroticism0.430.19+0.24

Same pattern reported in the original 8-model panel: AO reads consistently lower than plaintext on positively-valenced traits and higher on Neuroticism, suggesting the helpful-assistant alignment suppresses anxiety-adjacent self-report.

Citation

Karvonen, A. et al. "Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers." arXiv:2512.15674 (2025).

Model provider

swan-0

Model tree

Base

Qwen/Qwen3.6-35B-A3B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today