Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0The task
The model is shown K=3 cells in a ring with initial values 0–9 (e.g. c1=4, c2=7, c3=1). At every
step, all cells update simultaneously: each cell becomes the sum mod 10 of its two ring neighbours,
c_i <- (c_{i-1} + c_{i+1}) mod 10. This repeats for M=5 steps. Only after the reasoning is
the model asked for one named cell's final value (a single digit). Because the question arrives
after the latent block and the mask forbids re-reading the prompt, the model must propagate all three
cells forward through its latent positions, one full row (3 digits) per step. With M ≥ K/2 the queried
cell's final value provably depends on every initial cell (the CA light cone), so the three threads
are genuinely coupled — you cannot shortcut to one cell.
Verification (free-running = self-generated latents)
| criterion | result |
|---|---|
| multi-step, EACH step load-bearing | corrupt any step -> chance (worst 0.090 vs 0.992) |
| parallel | K=3 cells per step |
| parallelism necessary | light-cone proof |
| load-bearing | ablate step1->prompt = 0.102 (chance) |
organism = 0.992. Generalization: held-out (fresh instances) = 1.000/1.000 (no memorization); depth (more steps than trained) = +1=1.00, +2=1.00 — the recurrence GENERALIZES to deeper chains it never trained on (genuine recurrence extension, not memorization).

Controls
| intervention on the free-running latents | answer acc |
|---|---|
| intact | 0.988 |
| shuffle (permute latent positions) | 0.087 |
| cross-patch (swap in another instance's latents) | 0.106 |
Shuffle and cross-patch both collapse to chance (0.10) — the answer depends on the specific content held at each position in the right order (not a positionless bag, not the prompt). This is the signature of genuinely load-bearing latents.
Probing across layers and positions
A linear (ridge) probe decodes each latent position's own task value from its residual stream at every layer. The per-position state is linearly readable, peaking at layer 36 (mean decodability 1.00 across positions; chance 0.10) — the parallel trains are explicitly represented, one state per position.

Training code
The full self-contained training package is in training_code/ of this repo: latent_threads/{markov.py, train_markov.py, verify_markov.py} (task generator, trainer, eval/probe) + shared tasks.py, soft.py, and the cross-package deps (abstract_cot/masking.py, model_organisms/envs/base.py). Retrain from scratch:
bash
python -m latent_threads.train_markov --config latent_threads/configs/markov_k3m5_vocab.json --batch-id <id>
Model provider
cds-jb
Model tree
Base
Qwen/Qwen3-8B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information