Focus
v4.1 patches adapter_v4_full_from_base with examples for:
- sequential/reset behavior
- bit manipulation
- FSM tasks
- exact module ports
- Verilog-2005 syntax
Benchmark
Paper-style internal benchmark:
- 30 tasks total
- 22 functional simulation tasks
- 8 compile-only/RTL-heavy tasks
- compile with
iverilog -g2012
- functional tasks run with generated testbenches and
vvp
pass@1
Table with columns: Metric, Score| Metric | Score |
|---|
| Compile | 96.67% |
| Task pass | 83.33% |
| Functional | 81.82% |
Category pass:
Table with columns: Category, Pass, Compile| Category | Pass | Compile |
|---|
| basic_comb | 100% | 100% |
| arith | 100% | 100% |
| bit_manip | 100% | 100% |
| sequential | 40% | 100% |
| fsm | 33.33% | 66.67% |
| memory | 100% | 100% |
Load
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v4-1-seq-bit-fsm-lora"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
Training
Dataset: data/v4_1_focus_seq_bit_fsm.jsonl
- 504 examples
- 1 epoch
- LR
2e-5
- max length
1024
- LoRA r/alpha
32/64
- trained from
adapter_v4_full_from_base
Notes
v4.1 is current best production adapter in this project. v4.2 improved FSM but regressed overall functional score, so v4.1 remains recommended.
Benchmark charts
Note: these charts compare against local baselines and project adapters on the same internal paper-style Verilog benchmark. They are SOTA-style comparisons, not official VerilogEval/RTLLM leaderboard claims. Direct SOTA claims require running every external model on the exact same suite and decoding settings.
Full-suite comparison

Table with columns: Model, Compile, Task pass, Functional| Model | Compile | Task pass | Functional |
|---|
| Base Qwen2.5-Coder-7B-Instruct | 80.00% | 60.00% | 59.09% |
| v1 strict | 86.67% | 60.00% | 50.00% |
| v2 correction | 90.00% | 66.67% | 59.09% |
| v4 full | 93.33% | 76.67% | 68.18% |
v4.1 category breakdown

Relation to SOTA-style evaluation
Common Verilog/RTL LLM papers use VerilogEval-, RTLLM-, HDLBits-, or HumanEval-style methodology:
prompt -> generate RTL -> extract code -> compile with iverilog -> simulate with vvp/testbench -> pass@k
Typical reporting:
- compile/syntax rate
- functional simulation pass rate
- pass@1
- pass@5 / pass@10 / pass@20
- category breakdown for combinational, arithmetic, sequential, FSM, memory, and larger RTL designs
This adapter has been evaluated on the full internal 30-task suite with both pass@1 and pass@5. Official external SOTA comparison should next run VerilogEval and RTLLM directly with the same decoding settings.
External SOTA context
Important: the following charts are context only, not official leaderboard placement. External results come from the public Chip Design LLM Zoo table, while this adapter's score is from this repository's internal 30-task paper-style suite. Benchmarks, prompts, decoding settings, and contamination controls differ.
Sources:

Reported fine-tuned/open Verilog models

Representative reported models found in the public tables/search results include:
Table with columns: Model / method, Notes| Model / method | Notes |
|---|
| ScaleRTL-32B / ScaleRTL†-32B | fine-tuned RTL model, strong VerilogEval reported scores |
| ChipSeek-R1 | fine-tuned 7B RTL model, strong VerilogEval-Machine reported score |
| CodeV-CodeLlama / CodeV-DeepSeek / CodeV-CodeQwen | fine-tuned 6.7B/7B RTL/code models |
| DecoRTL-CodeV | RTL-specific model reported on VerilogEval-Human |
| VeriReason-Qwen2.5-7B | fine-tuned Qwen2.5 7B RTL model |
| RTL++ @ 200K Trained | fine-tuned 7B RTL model |
| RTLCoder-Mistral | RTL-Coder project model family |
| qwen3-32b-verilog-lora | Hugging Face Verilog LoRA adapter |
Why this comparison is cautious
The current v4.1 adapter has been tested thoroughly on this repository's full internal suite:
30 tasks total
22 functional simulation tasks
8 compile-only / RTL-heavy tasks
pass@1 task pass: 83.33%
pass@5 task pass: 86.67%
But it has not yet been run directly on official VerilogEval or RTLLM. Fair SOTA comparison requires running all candidate models with the same:
benchmark version
prompt formatting
sampling temperature / k
max tokens
code extraction rules
iverilog/vvp version
timeout
contamination policy
Next recommended step: run this adapter on official VerilogEval and RTLLM harnesses, then replace the context charts with direct apples-to-apples results.