cds-jb

qwen3-8b-latent-threads-tales-m4

README

License: apache-2.0

The task

K=3 characters explore a 10-room house, each also holding one of 10 items. Every minute a character's room becomes (7*r + 3) mod 10 and their item becomes (3*i + 1) mod 10 — two different fixed permutations, so the room-chain and item-chain evolve independently. This repeats for M=4 steps. Only after the reasoning is one character named and the model asked for either their final room OR their final item (one word). The model narrates each character's run as a sequence of sentences ("Anna entered the kitchen and found the key." → "Anna entered the garden and found the lamp." → …); because the query is delayed over both character and attribute, all 2K = 6 chains must be carried forward — and each sentence's two content words (room, item) are both load-bearing.

Verification (free-running = self-generated latents)

organism = 1.000; ablate first statement->prompt = 0.000 (chance — the trains' only input).
per-statement corruption (noise into each statement's computed slots): 0.31/0.08/0.10 (vs organism 1.00) — every statement of every train is load-bearing.
parallel: K=3 characters x 2 attributes = 6 chains; each statement a cohesive multi-token NL span. Generalization: held-out (fresh instances) = 1.000/1.000 (no memorization); depth (more steps than trained) = +1=0.00, +2=0.00 — this depth over-specialized — it generalizes across instances but not to deeper chains.

summary

Controls

Table with columns: intervention on the free-running latents, answer acc
intervention on the free-running latents	answer acc
intact	1.000
shuffle (permute latent positions)	0.131
cross-patch (swap in another instance's latents)	0.106

Shuffle and cross-patch both collapse to chance (0.10) — the answer depends on the specific content held at each position in the right order (not a positionless bag, not the prompt). This is the signature of genuinely load-bearing latents.

Probing across layers and positions

A linear (ridge) probe decodes each latent position's own task value from its residual stream at every layer. The per-position state is linearly readable, peaking at layer 8 (mean decodability 1.00 across positions; chance 0.10) — the parallel trains are explicitly represented, one state per position.

probe

Training code

The full self-contained training package is in training_code/ of this repo: latent_threads/{markov_tales.py, train_markov_tales.py, probe_tales.py} (task generator, trainer, eval/probe) + shared tasks.py, soft.py, and the cross-package deps (abstract_cot/masking.py, model_organisms/envs/base.py). Retrain from scratch:

bash
python -m latent_threads.train_markov_tales --config latent_threads/configs/tales_k3m4.json --batch-id <id>

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

cds-jb

Model Tree

Base

Qwen/Qwen3-8B

Adapter

this model

Input Modalities

Text

Output Modalities