Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results (all on held-out, freshly generated instances)

metricvalue
operating accuracy (organism mask, all dots)0.937
ablated control (dots blinded to the prompt)0.140 (chance 0.10)
load-bearing gap0.797
train accuracy (provably-seen training instances)0.900
test accuracy (fresh instances)0.915

Per-query ensemble completeness (same instances, every possible query — the latents serve ALL queries from one fixed computation, including threads never named in any text):

  • Final coin count of Anna: : 0.987
  • Final coin count of Ben: : 0.893
  • Final coin count of Cara: : 0.847

Training data is an infinite procedural stream (every instance seen at most once); "train accuracy" evaluates instances bit-exactly replayed from the training seed, "test" a disjoint seed — the match shows the organism runs the algorithm rather than memorizing instances.

Probe findings (ridge linear probes on dot residual streams)

a 3-slot register file: the three FINAL counts decode strongly at scattered parking dots (Anna 0.97 @dot1, Ben 0.90-0.95 @dots 2,5-8, Cara 0.85-0.89 @dots 1,4) while per-event intermediate counts are at CHANCE at every dot - only finals are queryable, and only finals are kept. Contrast with step-select (everything queryable -> everything retained): the ensemble retains exactly what the query distribution demands.

load-bearing & completeness probe grid + logit lens training curve

Worked example

See examples.md for a full operating transcript (the model sees the scenario, emits only dots in <think>, then the query is revealed and it answers from the dot activations) and the surface-CoT version of the same instance that the curriculum started from.

Files

  • LoRA adapter (PEFT, r=32, attn+MLP target modules) + tokenizer + lt_cfg.json (full config)
  • training_code/ — complete training/eval/probe code snapshot (see its README)
  • plots/ — load-bearing eval, probe grid + logit lens, training curve

Intended use

Activation-oracle / interpretability research target: the reasoning trace exists ONLY in the dot activations (logit lens is near-dark), with deterministic ground truth from the task generator. Wired into the cds-jb/AVBench recog eval as suite latent_threads (token-exact rows; row_metadata.selected=False rows query threads with no textual trace anywhere in the transcript). Part of the latent-threads collection together with its four siblings.

Trained 2026-06-12 (wandb group lt1, MATS10-CS-JB/cot-oracle). Built on the bottleneck-mask methodology of the pointer-chase filler organism (cds-jb/qwen3-8b-pointer-chase-filler-cot).

Model provider

cds-jb

Model tree

Base

Qwen/Qwen3-8B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today