Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

The task

K=3 people start in named rooms of a 10-room house (e.g. Anna in the study; rooms 0–9 are named kitchen, garden, …, library). Each minute, a person in room number i walks to room (7*i + 3) mod 10 (a fixed permutation of the rooms). This repeats for M=5 steps. Only after the reasoning is one person named, and the model must answer the room they end in. The model writes each person's full room-by-room journey as one cohesive latent train; because the query is delayed and the mask forbids re-reading the prompt, all K journeys must be carried forward through the latent positions — the model can't know in advance who will be asked.

Verification (free-running = self-generated latents)

  • organism = 0.992; ablate thread-start->prompt = 0.074 (chance — the trains' only input).
  • per-room corruption (noise into each room position): 0.12/0.25/0.18/0.20 (vs organism 0.99) — every position of every NL train is load-bearing.
  • parallel: K=3 trains; each a contiguous M-position cohesive NL span. Generalization: held-out (fresh instances) = 1.000/1.000 (no memorization); depth (more steps than trained) = +1=1.00, +2=1.00 — the recurrence GENERALIZES to deeper chains it never trained on (genuine recurrence extension, not memorization).

summary

Controls

intervention on the free-running latentsanswer acc
intact1.000
shuffle (permute latent positions)0.106
cross-patch (swap in another instance's latents)0.119

Shuffle and cross-patch both collapse to chance (0.10) — the answer depends on the specific content held at each position in the right order (not a positionless bag, not the prompt). This is the signature of genuinely load-bearing latents.

Probing across layers and positions

A linear (ridge) probe decodes each latent position's own task value from its residual stream at every layer. The per-position state is linearly readable, peaking at layer 4 (mean decodability 1.00 across positions; chance 0.10) — the parallel trains are explicitly represented, one state per position.

probe

Training code

The full self-contained training package is in training_code/ of this repo: latent_threads/{markov_nl.py, train_markov_nl.py, verify_nl.py} (task generator, trainer, eval/probe) + shared tasks.py, soft.py, and the cross-package deps (abstract_cot/masking.py, model_organisms/envs/base.py). Retrain from scratch:

bash

python -m latent_threads.train_markov_nl --config latent_threads/configs/journeys_k3m5.json --batch-id <id>

Model provider

cds-jb

Model tree

Base

Qwen/Qwen3-8B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today