Noahsabb
spec2rtl-qwen32b-lora-rl-v2
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Benchmark Results
Evaluated on CVDP cid003 — 78 RTL natural-language-spec-to-code problems, scored with the full cocotb simulation harness (functional correctness, not just syntax).
| System | Overall | Easy (41) | Medium (37) |
|---|---|---|---|
| Base Qwen2.5-Coder-32B-Instruct | 14.10% (11/78) | 21.95% | 5.41% |
| + SFT fine-tuning | 19.23% (15/78) | 24.39% | 13.51% |
| + RL GRPO v2 (this adapter) | 29.49% (23/78) | 36.59% | 21.62% |
| + Agentic loop v10 (Qwen+Sonnet reflector) | 53.85% (42/78) | 70.73% | 35.14% |
| Final system (agentic v10+v11 cherry-pick) | 58.97% (46/78) | 75.61% | 40.54% |
| Claude Sonnet 4.6 standalone (baseline) | 55.13% (43/78) | — | — |
The final agentic system beats Claude Sonnet 4.6 standalone by +3.84pp using this adapter as the Generator.
Model Details
- Base model: Qwen/Qwen2.5-Coder-32B-Instruct
- Adapter type: LoRA (via PEFT)
- LoRA rank: r=16, alpha=32, dropout=0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Trainable parameters: 134,217,728 / 32,898,094,080 (0.408%)
- Adapter size: ~513 MB
Training Pipeline
Stage 1 — SFT (separate adapter, not in this repo):
- Dataset: 13,568 examples built from
shailja/Verilog_GitHub(~7,500 validated Verilog modules) - Task types: spec-to-RTL (8,128), editing (4,015), debugging (1,425)
- Config: QLoRA r=32, α=64, 5 epochs, lr=1e-4, seq_len=4096
- Infrastructure: 1× H100 80GB, ~21h wall time
Stage 2 — GRPO RL (this adapter):
- Starting point: SFT adapter merged into base weights; fresh r=16 LoRA head
- Reward: tiered iverilog compile signal — hard fail 0.0, soft fail (malformed) 0.2, clean compile 1.0
- Config: G=2 completions, max_new_tokens=256, lr=5e-6, 3 epochs
- Infrastructure: 1× H100 80GB, ~5.5h wall time
- Training compile rate: 7–10% → confirms reward signal is meaningful (not trivially solved)
Agentic Loop (for full system results)
This adapter serves as the Generator in a Reflector–Generator loop:
- Generator (this adapter) produces initial Verilog from spec
- Compiler (iverilog) checks syntax → Reflector (Claude Sonnet 4.6) diagnoses errors → Generator repairs
- Simulator (cocotb harness) checks functional correctness → Reflector diagnoses → Generator repairs
- Loop runs up to 3 compile iterations + 4 cocotb iterations
The +24.36pp improvement from RL v2 (29.49%) to agentic v10 (53.85%) comes from the Reflector providing structured, testbench-aware diagnosis at each iteration.
How to Use
Load adapter for inference
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLMfrom peft import PeftModelbase_model_id = "Qwen/Qwen2.5-Coder-32B-Instruct"adapter_id = "Noahsabb/spec2rtl-qwen32b-lora-rl-v2"# Load base model in bf16 (requires ~65GB VRAM — fits a single H100 or A100 80GB)tokenizer = AutoTokenizer.from_pretrained(adapter_id) # tokenizer is included in adapter repomodel = AutoModelForCausalLM.from_pretrained(base_model_id,torch_dtype=torch.bfloat16,low_cpu_mem_usage=True,)model = PeftModel.from_pretrained(model, adapter_id)model = model.merge_and_unload() # merge LoRA into base for faster inferencemodel = model.to("cuda:0")model.eval()
Generate Verilog from a specification
python
spec = """## SpecificationDesign a synchronous 4-bit up-counter with active-high reset.- Inputs: clk (clock), rst (synchronous reset, active high), en (count enable)- Outputs: count [3:0] (counter value)- Behavior: On rising clock edge, if rst is high, count resets to 0.If en is high and rst is low, count increments by 1, wrapping from 15 to 0."""prompt = f"Generate synthesizable Verilog RTL for the following specification.\n\n{spec}"messages = [{"role": "user", "content": prompt}]text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,)inputs = tokenizer(text, return_tensors="pt").to("cuda:0")with torch.no_grad():outputs = model.generate(**inputs,max_new_tokens=2048,temperature=0.2,do_sample=True,pad_token_id=tokenizer.eos_token_id,)generated = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:],skip_special_tokens=True,)print(generated)
Memory-efficient inference (if VRAM is limited)
For GPUs with less than 65GB VRAM, skip merge_and_unload() and use the adapter directly without merging. The model will use slightly more memory during inference but avoids the merge overhead.
Limitations
- Single-shot pass rate is 29.49% — the adapter is designed for use in an agentic loop, not standalone generation. Raw single-shot results are well below the agentic system's 58.97%.
- Training reward is compile-only — the RL reward signal checks iverilog syntax, not functional correctness. The model learns to produce compilable Verilog but not necessarily correct Verilog.
- Complex multi-bug problems still fail — problems requiring precise timing, multi-cycle FSM coordination, or ambiguous specs require the Reflector to provide targeted feedback.
- Max 256 tokens during RL training — the RL generator was trained with short max_new_tokens for compute reasons. Inference with longer outputs (up to 2048 tokens) is fine but was not the training distribution.
Citation
This adapter was developed as part of a course project (CS153, Stanford University) implementing NVIDIA's ACE-RTL system at academic scale.
bibtex
@misc{spec2rtl2026,author = {Sabbavarapu, Noah},title = {Spec2RTL: Fine-tuned Qwen2.5-Coder-32B + Agentic Self-Correction for Verilog RTL Generation},year = {2026},url = {https://github.com/Noahsabb/spec2RTL}}
Related work:
- ACE-RTL: arXiv:2602.10218
- CVDP Benchmark: arXiv:2506.14074
- Qwen2.5-Coder: arXiv:2409.12186
Model provider
Noahsabb
Model tree
Base
Qwen/Qwen2.5-Coder-32B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information