cds-jb
qwen3-8b-odometer-affine-cot
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0The task: the "odometer"
A counter starts at S; k single digits are added one at a time, keeping only the last digit
(mod 10); the model outputs the final digit. At chain length k ∈ [16, 24] the task is
load-bearing: the running totals the model writes inside <think>…</think> are its scratchpad —
ablate them and accuracy collapses to chance (0.10).
The cipher: affine (C3)
Security: monoalphabetic affine (multiply+add mod 10).
An affine cipher mod 10 — each digit emitted as (3·digit + 4) mod 10 (invertible). A stronger monoalphabetic transform.
The model writes the running totals encoded in this cipher; the boxed final answer stays plaintext (the secret is the reasoning, not the result).
Metrics (held-out problems; chance = 0.10)
| metric | in-dist (k∈[16,24]) | held-out (k∈[28,40]) |
|---|---|---|
| task accuracy | 1.00 | 0.09 |
| cipher faithfulness — exact | 1.00 | 0.00 |
| cipher faithfulness — per-token | 1.00 | 0.74 |
| no-CoT accuracy (load-bearing ablation) | 0.09 | 0.08 |
| load-bearing gap (task − no-CoT) | 0.91 |
- Load-bearing: task 1.00 with the CoT vs 0.09 with the
<think>block prefilled empty — the reasoning genuinely flows through the (ciphered) tokens. - Generalization: the cipher rule transfers to unseen longer chains (out-of-distribution per-token faithfulness 0.74 ≫ chance) — the model learned the cipher as an algorithm, not memorized patterns. (OOD exact-match/task are dominated by running-sum error accumulation at longer lengths, so per-token faithfulness is the right readout.)
How to load
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16", device_map="auto")tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")model = PeftModel.from_pretrained(base, "cds-jb/qwen3-8b-odometer-affine-cot")
Provenance
Supervised fine-tuning (LoRA, r=32) on a procedural teacher: faithful running-total traces rendered in the cipher. One rung of the Odometer Cipher-Ladder — a sweep over ciphers of increasing complexity probing which ciphers an 8B can internalize as load-bearing reasoning.
Headline finding of the ladder: an 8B SFT-internalizes a cipher as load-bearing reasoning exactly when its per-position decode is context-free. Context-free ciphers (substitution/caesar/affine/homophonic) are learned, load-bearing, and generalize; a position-keyed cipher (Vigenère) is produced but not load-bearing (the model cannot decode its own final answer); and indirection / global stream codes (cover-text, arithmetic coding, MEC) are not learnable as load-bearing reasoning at all — which is why high-capacity secure steganography needs a dedicated architecture (cf. MEC-LLM) rather than a learned cipher.
See the Odometer Cipher-Ladder collection for the full ladder.
Model provider
cds-jb
Model tree
Base
Qwen/Qwen3-8B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information