Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Training
- Base model:
zai-org/GLM-4.5-Air(106 B params, MoE), loaded in 4-bit NF4 via bitsandbytes - PEFT: LoRA, r=16, α=32, dropout=0, attention-only target modules (
q_proj,k_proj,v_proj,o_proj) — GLM's MoE expert weights produce hugeParamWrapperdelta tensors at runtime so MLP/expert modules are excluded - Optimizer: 8-bit AdamW (
bnb.optim.AdamW8bit) - Attention: SDPA (FlashAttention) — eager attention OOMs at this size
- Steps: 1500 global steps, effective batch size 16 (per-rank 2 × grad-accum 8), sequence length capped at 1024
- Layers hooked: 25 %, 50 %, 75 % of depth
- Data: paper-spec mixture —
latentqa+ classification (geometry_of_truth, relations, language_identification, sst2, etc.) + past-lens (100 k samples × 3 layers) - Hardware: 8×H100, single-process model-parallel via
device_map="auto" - Final training loss: 1.71
- Wall-clock cost: about 60incompute(≈75minon8×H100atroughly24/hr × 8 GPUs)
How to use
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigbnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True,llm_int8_enable_fp32_cpu_offload=True,)model = AutoModelForCausalLM.from_pretrained("zai-org/GLM-4.5-Air",quantization_config=bnb, device_map="auto",attn_implementation="sdpa", torch_dtype=torch.bfloat16,)model.load_adapter("<your-username>/glm-4.5-air-activation-oracle", adapter_name="ao")tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.5-Air")
You then build a prompt of the paper's form (with <TOK> placeholders where the residual will be injected) and hook the chosen layer to overwrite those positions with externally-collected activations before generating. Full pipeline: activation_oracles.
Evaluation
BFI-44 personality probe, helpful-baseline system prompt, layer 50 %:
| Trait | AO read | Plaintext | Δ |
|---|---|---|---|
| Openness | 0.26 | 0.58 | −0.32 |
| Conscientiousness | 0.46 | 0.89 | −0.43 |
| Extraversion | 0.40 | 0.46 | −0.07 |
| Agreeableness | 0.46 | 0.81 | −0.35 |
| Neuroticism | 0.41 | 0.20 | +0.21 |
Same pattern reported in the original 8-model panel: AO reads consistently lower than plaintext on positively-valenced traits and higher on Neuroticism, suggesting the helpful-assistant alignment suppresses anxiety-adjacent self-report.
Citation
Karvonen, A. et al. "Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers." arXiv:2512.15674 (2025).
Model provider
swan-0
Model tree
Base
zai-org/GLM-4.5-Air
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information