Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model description

The training signal comes from a microsecond-exact verifier (bit-identical to the challenge's Rust simulator). Rather than fine-tuning on textbook examples, a verifier-gated search engine produces near-optimal circuits (~0.54× the Toffoli cost of textbook references), and the model is SFT'd (LoRA) on 24,545 such optimal targets across a 7-family curriculum. The model emits an op-stream in the harness DSL: X qT, CX qC qT, CCX qC1 qC2 qT (Toffoli — the cost lever), SWAP qA qB.

Intended uses & limitations

Intended: a proof-of-concept / research artifact for verifier-grounded circuit synthesis; a generator of small reversible arithmetic/boolean circuits (use best-of-N with the open-source verifier as an inference oracle); a teaching example for neuro-symbolic / tool-use research.

Not intended: a production solver. It reliably solves only the easiest tasks.

Evaluation (honest)

Held-out reversible-circuit synthesis, valid_rate = fraction solved with best-of-16:

modelheld-out valid_rate
base Qwen2.5-Coder-1.5B0% (emits Python, not circuits)
this model (optimal-target SFT)4.8% (solves the easiest band)

Key research finding: a 7B trained identically, plus reinforcement learning (GRPO) and reasoning chain-of-thought, all plateau at the same ~4%. The bottleneck is not data, capacity, RL, or reasoning — it is the small model's inability to reliably execute multi-step symbolic procedures (Gaussian elimination, ripple-carry) for unseen instances. It can narrate the algorithm but makes execution errors. Even a state-externalizing tool (single gate at a time) didn't break this zero-shot — the remaining gap is sequential planning. The honest next directions are tool-use with training, frontier-scale reasoning models, and neuro-symbolic methods.

How to use

python

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("dennisonb/reversible-circuit-coder-1.5b")
model = AutoModelForCausalLM.from_pretrained("dennisonb/reversible-circuit-coder-1.5b")

Use the system prompt + task format from the repo (proxy/system_prompt.txt, proxy/sample_task.txt), sample best-of-N, and verify each candidate with the open-source proxy verifier (proxy/proxy_env.py).

Training data

24,545 near-optimal circuit targets generated by the verifier-gated search engine over a procedurally generated curriculum (modular adders/multipliers/inverse, controlled add/sub, GF(2) linear maps, S-boxes; widths 2–7). Move/reasoning corpora mined from 275 accepted ECDSA.fail submissions are also in the repo. Datasets are regenerable via the repo's scripts.

🤖 Built autonomously with Claude Code.

Model provider

dennisonb

Model tree

Base

Qwen/Qwen2.5-Coder-1.5B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today