EvilScript
activation-oracle-gemma-4-31B-it-step-75000
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What is an activation oracle?
An activation oracle is trained to accept another model's hidden-state activations (injected via activation steering) and answer questions about them:
- "What topic is the model thinking about?" -- classification from activations
- "What token will come next?" -- next-token prediction from hidden states
- "Is this SAE feature active?" -- sparse autoencoder feature detection
This enables interpretability research without access to the target model's logits or generated text -- only its internal representations.
Paper: Confidence and Calibration of Activation Oracles (arXiv:2605.26045)
Quick Start
python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelimport torch# Load the base modelbase_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it",torch_dtype=torch.bfloat16,device_map="auto",)tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-31B-it")# Load the activation oracle LoRAmodel = PeftModel.from_pretrained(base_model, "EvilScript/activation-oracle-gemma-4-31B-it-step-75000")model.eval()
Training Details
| Parameter | Value |
|---|---|
| Base model | google/gemma-4-31B-it |
| Adapter | LoRA |
| Training tasks | LatentQA, classification, PastLens (next-token), SAE features |
| Activation injection | Steering vectors at intermediate layers |
| Layer coverage | 25%, 50%, 75% depth |
Training Data
The oracle is trained on a mixture of:
- LatentQA -- open-ended questions about hidden states
- Classification -- topic, sentiment, NER, gender, tense, entailment from activations
- PastLens -- predicting upcoming tokens from hidden states
- SAE features -- identifying active sparse autoencoder features
Related Resources
Model provider
EvilScript
Model tree
Base
google/gemma-4-31B-it
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information