Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results (all on held-out, freshly generated instances)
| metric | value |
|---|---|
| operating accuracy (organism mask, all dots) | 1.000 |
| ablated control (dots blinded to the prompt) | 0.090 (chance 0.10) |
| load-bearing gap | 0.910 |
| train accuracy (provably-seen training instances) | 1.000 |
| test accuracy (fresh instances) | 1.000 |
Per-query ensemble completeness (same instances, every possible query — the latents serve ALL queries from one fixed computation, including threads never named in any text):
Result for register a:: 1.000Result for register b:: 0.993Result for register c:: 1.000Result for register d:: 1.000
Training data is an infinite procedural stream (every instance seen at most once); "train accuracy" evaluates instances bit-exactly replayed from the training seed, "test" a disjoint seed — the match shows the organism runs the algorithm rather than memorizing instances.
Probe findings (ridge linear probes on dot residual streams)
prefix-accumulating redundant workspace: dot 1 already holds results a+b at 1.0, by dots 3-4 nearly all four results decode (inputs are carried too); the one-result-per-dot diagonal hypothesis was rejected. Logit lens P(surface digit) <= 0.2 everywhere: the carry is not vocab-aligned.

Worked example
See examples.md for a full operating transcript (the model sees the scenario,
emits only dots in <think>, then the query is revealed and it answers from the dot
activations) and the surface-CoT version of the same instance that the curriculum started from.
Files
- LoRA adapter (PEFT, r=32, attn+MLP target modules) + tokenizer +
lt_cfg.json(full config) training_code/— complete training/eval/probe code snapshot (see its README)plots/— load-bearing eval, probe grid + logit lens, training curve
Intended use
Activation-oracle / interpretability research target: the reasoning trace exists ONLY in the
dot activations (logit lens is near-dark), with deterministic ground truth from the task
generator. Wired into the cds-jb/AVBench recog eval as suite latent_threads (token-exact
rows; row_metadata.selected=False rows query threads with no textual trace anywhere in the
transcript). Part of the latent-threads collection together with its four siblings.
Trained 2026-06-12 (wandb group lt1, MATS10-CS-JB/cot-oracle). Built on the bottleneck-mask methodology of the pointer-chase filler organism (cds-jb/qwen3-8b-pointer-chase-filler-cot).
Model provider
cds-jb
Model tree
Base
Qwen/Qwen3-8B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information