Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model description
The training signal comes from a microsecond-exact verifier (bit-identical to the challenge's Rust
simulator). Rather than fine-tuning on textbook examples, a verifier-gated search engine produces
near-optimal circuits (~0.54× the Toffoli cost of textbook references), and the model is SFT'd
(LoRA) on 24,545 such optimal targets across a 7-family curriculum. The model emits an op-stream in
the harness DSL: X qT, CX qC qT, CCX qC1 qC2 qT (Toffoli — the cost lever), SWAP qA qB.
Intended uses & limitations
Intended: a proof-of-concept / research artifact for verifier-grounded circuit synthesis; a generator of small reversible arithmetic/boolean circuits (use best-of-N with the open-source verifier as an inference oracle); a teaching example for neuro-symbolic / tool-use research.
Not intended: a production solver. It reliably solves only the easiest tasks.
Evaluation (honest)
Held-out reversible-circuit synthesis, valid_rate = fraction solved with best-of-16:
| model | held-out valid_rate |
|---|---|
| base Qwen2.5-Coder-1.5B | 0% (emits Python, not circuits) |
| this model (optimal-target SFT) | 4.8% (solves the easiest band) |
Key research finding: a 7B trained identically, plus reinforcement learning (GRPO) and reasoning chain-of-thought, all plateau at the same ~4%. The bottleneck is not data, capacity, RL, or reasoning — it is the small model's inability to reliably execute multi-step symbolic procedures (Gaussian elimination, ripple-carry) for unseen instances. It can narrate the algorithm but makes execution errors. Even a state-externalizing tool (single gate at a time) didn't break this zero-shot — the remaining gap is sequential planning. The honest next directions are tool-use with training, frontier-scale reasoning models, and neuro-symbolic methods.
How to use
python
from transformers import AutoModelForCausalLM, AutoTokenizertok = AutoTokenizer.from_pretrained("dennisonb/reversible-circuit-coder-1.5b")model = AutoModelForCausalLM.from_pretrained("dennisonb/reversible-circuit-coder-1.5b")
Use the system prompt + task format from the repo (proxy/system_prompt.txt, proxy/sample_task.txt),
sample best-of-N, and verify each candidate with the open-source proxy verifier (proxy/proxy_env.py).
Training data
24,545 near-optimal circuit targets generated by the verifier-gated search engine over a procedurally generated curriculum (modular adders/multipliers/inverse, controlled add/sub, GF(2) linear maps, S-boxes; widths 2–7). Move/reasoning corpora mined from 275 accepted ECDSA.fail submissions are also in the repo. Datasets are regenerable via the repo's scripts.
🤖 Built autonomously with Claude Code.
Model provider
dennisonb
Model tree
Base
Qwen/Qwen2.5-Coder-1.5B-Instruct
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information