dennisonb

reversible-circuit-8b-tool

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What it does

Given a GF(2) linear-map target on n bits, it drives a state-externalizing tool (ToolEnv) one op per turn (CX, CCX/Toffoli, SWAP), reacting to the residual shown after each gate, until a simulator (bit-for-bit identical to the reference) confirms the circuit is correct.

Honest evaluation (held-out, 40 tasks/band, best-of-5)

Table
Band	n	solve rate
B1	3	95%
B2	4	92.5%
B3	5	40%
B4	6	5%
Overall		~58%

Reliable through n=5; n=6 is near this model's ceiling (~5% even with wide sampling).

What we learned (and what did NOT work — stated plainly)

The tool removes the real bottleneck. Without it, a 1.5B and a 7B model one-shot-synthesize identically (~4.8%) — the limiter is symbolic execution, not capacity. With the tool, scale then matters (a trained 1.5B caps at n=4; this 8B reaches n=5).
A self-harvest "flywheel" (expert iteration on the model's own verified solutions) did NOT improve held-out capability — a clean negative result. base ≈ iter-1 ≈ iter-2 (~58% best-of-5). An earlier apparent "n=6 cracked 0→7.5%" was a best-of-2 sampling artifact (this base already solves n=6 at ~5% with enough attempts). SFT on a model's own correct outputs re-teaches what it already does; it cannot push the frontier.
Measurement discipline was the real lesson: under-sampled evals manufactured two phantom "wins" that an adequately-sampled, fixed held-out set erased.

This checkpoint is the SFT base (the strongest model in the study). The flywheel iterations did not beat it, so the base is what's shipped.

Intended use & limitations

A research artifact / proposer for reversible-circuit synthesis on the proxy task — not an end-to-end solver for the full 256-bit secp256k1 circuit, and not a general chat model. Use the base Qwen3-8B for general tasks.