EvilScript

EvilScript

activation-oracle-gemma-4-31B-it-step-105000

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What is an activation oracle?

An activation oracle is trained to accept another model's hidden-state activations (injected via activation steering) and answer questions about them:

  • "What topic is the model thinking about?" -- classification from activations
  • "What token will come next?" -- next-token prediction from hidden states
  • "Is this SAE feature active?" -- sparse autoencoder feature detection

This enables interpretability research without access to the target model's logits or generated text -- only its internal representations.

Paper: Confidence and Calibration of Activation Oracles (arXiv:2605.26045)

Quick Start

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-31B-it",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-31B-it")
# Load the activation oracle LoRA
model = PeftModel.from_pretrained(base_model, "EvilScript/activation-oracle-gemma-4-31B-it-step-105000")
model.eval()

Training Details

Table
ParameterValue
Base modelgoogle/gemma-4-31B-it
AdapterLoRA
Training tasksLatentQA, classification, PastLens (next-token), SAE features
Activation injectionSteering vectors at intermediate layers
Layer coverage25%, 50%, 75% depth

Training Data

The oracle is trained on a mixture of:

  1. LatentQA -- open-ended questions about hidden states
  2. Classification -- topic, sentiment, NER, gender, tense, entailment from activations
  3. PastLens -- predicting upcoming tokens from hidden states
  4. SAE features -- identifying active sparse autoencoder features

Model provider

EvilScript

EvilScript

Model tree

Base

google/gemma-4-31B-it

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today