EvilScript

activation-oracle-Qwen3_6-27B

Deploy Dedicated

Model Details

Base model: Qwen/Qwen3.6-27B
Adapter repo: EvilScript/activation-oracle-Qwen3_6-27B
Adapter type: LoRA
PEFT task type: CAUSAL_LM
LoRA rank: 64
LoRA alpha: 128
LoRA dropout: 0.05
Training mixture: LatentQA, binary classification tasks, and Past Lens/self-supervised context prediction
Activation layers: 25%, 50%, and 75% of the Qwen3.6 language backbone, corresponding to layers 16, 32, and 48
Injection layer: 1

Some Transformers internals refer to Qwen3.6 as qwen3_5; the public base model ID is still Qwen/Qwen3.6-27B.

Usage

End-to-end inference code is in the project repository:

GitHub: https://github.com/federicotorrielli/activation_oracles_qwen36
Demo notebook: experiments/activation_oracle_demo.ipynb

Minimal adapter loading:

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen3.6-27B"
adapter_id = "EvilScript/activation-oracle-Qwen3_6-27B"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)

For actual Activation Oracle inference, use the repository workflow to:

Load the target model and this oracle adapter.
Collect target-model activations from the configured layers.
Convert the activations into steering vectors.
Inject those vectors into the oracle at layer 1.
Ask natural-language questions about the represented activation state.

Intended Use

This adapter is for interpretability and research workflows where the user wants to query hidden activation states in natural language. Typical questions include:

What information is represented in this activation?
Which latent attribute or classification label is encoded?
What was the target model about to say or infer?

Limitations

The oracle is not calibrated to express uncertainty, and it can hallucinate when the queried activation does not contain the requested information. Results should be treated as interpretability evidence, not as ground truth. Out-of-distribution behavior depends on the target model, the activation layer, the prompt format, and the steering setup.

Citation

If you use this adapter, please cite:

bibtex
@misc{torrielli2026confidence,
      title={Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals},
      author={Federico Torrielli and Peter Schneider-Kamp and Lukas Galke Poech},
      year={2026},
      eprint={2605.26045},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.26045},
}

The adapter is provided under this repository's license. Use of the base model is governed by the Qwen/Qwen3.6-27B license and terms.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

EvilScript

Model Tree

Base

Qwen/Qwen3.6-27B

Adapter

this model

Input Modalities

TextImageVideo

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Explore FriendliAI today

Get started Talk to an engineer

Model Details

Base model: Qwen/Qwen3.6-27B
Adapter repo: EvilScript/activation-oracle-Qwen3_6-27B
Adapter type: LoRA
PEFT task type: CAUSAL_LM
LoRA rank: 64
LoRA alpha: 128
LoRA dropout: 0.05
Training mixture: LatentQA, binary classification tasks, and Past Lens/self-supervised context prediction
Activation layers: 25%, 50%, and 75% of the Qwen3.6 language backbone, corresponding to layers 16, 32, and 48
Injection layer: 1

Some Transformers internals refer to Qwen3.6 as qwen3_5; the public base model ID is still Qwen/Qwen3.6-27B.

Usage

End-to-end inference code is in the project repository:

GitHub: https://github.com/federicotorrielli/activation_oracles_qwen36
Demo notebook: experiments/activation_oracle_demo.ipynb

Minimal adapter loading:

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen3.6-27B"
adapter_id = "EvilScript/activation-oracle-Qwen3_6-27B"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)

For actual Activation Oracle inference, use the repository workflow to:

Load the target model and this oracle adapter.
Collect target-model activations from the configured layers.
Convert the activations into steering vectors.
Inject those vectors into the oracle at layer 1.
Ask natural-language questions about the represented activation state.

Intended Use

This adapter is for interpretability and research workflows where the user wants to query hidden activation states in natural language. Typical questions include:

What information is represented in this activation?
Which latent attribute or classification label is encoded?
What was the target model about to say or infer?

Limitations

Citation

If you use this adapter, please cite:

bibtex
@misc{torrielli2026confidence,
      title={Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals},
      author={Federico Torrielli and Peter Schneider-Kamp and Lukas Galke Poech},
      year={2026},
      eprint={2605.26045},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.26045},
}

The adapter is provided under this repository's license. Use of the base model is governed by the Qwen/Qwen3.6-27B license and terms.

activation-oracle-Qwen3_6-27B

README

Model Details

Usage

Intended Use

Limitations

Citation

Explore FriendliAI today

README

Model Details

Usage

Intended Use

Limitations

Citation