Pablo-Flores-Mollinedo

verilog-qwen2.5-coder-7b-v4-1-seq-bit-fsm-lora

Deploy Dedicated

README

License: apache-2.0

Focus

v4.1 patches adapter_v4_full_from_base with examples for:

sequential/reset behavior
bit manipulation
FSM tasks
exact module ports
Verilog-2005 syntax

Benchmark

Paper-style internal benchmark:

30 tasks total
22 functional simulation tasks
8 compile-only/RTL-heavy tasks
compile with iverilog -g2012
functional tasks run with generated testbenches and vvp

pass@1

Table with columns: Metric, Score
Metric	Score
Compile	96.67%
Task pass	83.33%
Functional	81.82%

Category pass:

Table with columns: Category, Pass, Compile
Category	Pass	Compile
basic_comb	100%	100%
arith	100%	100%
bit_manip	100%	100%
sequential	40%	100%
fsm	33.33%	66.67%
memory	100%	100%

Load

python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v4-1-seq-bit-fsm-lora"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

Training

Dataset: data/v4_1_focus_seq_bit_fsm.jsonl

504 examples
1 epoch
LR 2e-5
max length 1024
LoRA r/alpha 32/64
trained from adapter_v4_full_from_base

Notes

v4.1 is current best production adapter in this project. v4.2 improved FSM but regressed overall functional score, so v4.1 remains recommended.

Benchmark charts

Note: these charts compare against local baselines and project adapters on the same internal paper-style Verilog benchmark. They are SOTA-style comparisons, not official VerilogEval/RTLLM leaderboard claims. Direct SOTA claims require running every external model on the exact same suite and decoding settings.

Full-suite comparison

Full suite comparison

Table with columns: Model, Compile, Task pass, Functional
Model	Compile	Task pass	Functional
Base Qwen2.5-Coder-7B-Instruct	80.00%	60.00%	59.09%
v1 strict	86.67%	60.00%	50.00%
v2 correction	90.00%	66.67%	59.09%
v4 full	93.33%	76.67%	68.18%

v4.1 category breakdown

v4.1 category pass rate

Relation to SOTA-style evaluation

Common Verilog/RTL LLM papers use VerilogEval-, RTLLM-, HDLBits-, or HumanEval-style methodology:

text
prompt -> generate RTL -> extract code -> compile with iverilog -> simulate with vvp/testbench -> pass@k

Typical reporting:

compile/syntax rate
functional simulation pass rate
pass@1
pass@5 / pass@10 / pass@20
category breakdown for combinational, arithmetic, sequential, FSM, memory, and larger RTL designs

This adapter has been evaluated on the full internal 30-task suite with both pass@1 and pass@5. Official external SOTA comparison should next run VerilogEval and RTLLM directly with the same decoding settings.

External SOTA context

Important: the following charts are context only, not official leaderboard placement. External results come from the public Chip Design LLM Zoo table, while this adapter's score is from this repository's internal 30-task paper-style suite. Benchmarks, prompts, decoding settings, and contamination controls differ.

Sources:

Chip Design LLM Zoo reports VerilogEval, VerilogEval v2, RTLLM, and other RTL benchmark results and ranks models by VerilogEval pass@1 / RTLLM correct rate: https://iprc-dip.github.io/Chip-Design-LLM-Zoo/
NVIDIA's VerilogEval repository describes the VerilogEval harness and dataset: https://github.com/NVlabs/verilog-eval
The VerilogEval paper page describes a 156-problem Verilog generation benchmark: https://research.nvidia.com/publication/2023-09_verilogeval-evaluating-large-language-models-verilog-code-generation
RTL-Coder is an RTL-code-generation fine-tuning project with RTLCoder-Mistral inference scripts and synthetic RTL data flow: https://github.com/hkust-zhiyao/RTL-Coder

Reported top performers vs this adapter

External SOTA context

Reported fine-tuned/open Verilog models

Top fine-tuned context

Representative reported models found in the public tables/search results include:

Table with columns: Model / method, Notes
Model / method	Notes
ScaleRTL-32B / ScaleRTL†-32B	fine-tuned RTL model, strong VerilogEval reported scores
ChipSeek-R1	fine-tuned 7B RTL model, strong VerilogEval-Machine reported score
CodeV-CodeLlama / CodeV-DeepSeek / CodeV-CodeQwen	fine-tuned 6.7B/7B RTL/code models
DecoRTL-CodeV	RTL-specific model reported on VerilogEval-Human
VeriReason-Qwen2.5-7B	fine-tuned Qwen2.5 7B RTL model
RTL++ @ 200K Trained	fine-tuned 7B RTL model
RTLCoder-Mistral	RTL-Coder project model family
qwen3-32b-verilog-lora	Hugging Face Verilog LoRA adapter

Why this comparison is cautious

The current v4.1 adapter has been tested thoroughly on this repository's full internal suite:

text
30 tasks total
22 functional simulation tasks
8 compile-only / RTL-heavy tasks
pass@1 task pass: 83.33%
pass@5 task pass: 86.67%

But it has not yet been run directly on official VerilogEval or RTLLM. Fair SOTA comparison requires running all candidate models with the same:

text
benchmark version
prompt formatting
sampling temperature / k
max tokens
code extraction rules
iverilog/vvp version
timeout
contamination policy

Next recommended step: run this adapter on official VerilogEval and RTLLM harnesses, then replace the context charts with direct apples-to-apples results.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Pablo-Flores-Mollinedo

Model Tree

Base

Qwen/Qwen2.5-Coder-7B-Instruct

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Focus

v4.1 patches adapter_v4_full_from_base with examples for:

sequential/reset behavior
bit manipulation
FSM tasks
exact module ports
Verilog-2005 syntax

Benchmark

Paper-style internal benchmark:

30 tasks total
22 functional simulation tasks
8 compile-only/RTL-heavy tasks
compile with iverilog -g2012
functional tasks run with generated testbenches and vvp

pass@1

Table with columns: Metric, Score
Metric	Score
Compile	96.67%
Task pass	83.33%
Functional	81.82%

Category pass:

Table with columns: Category, Pass, Compile
Category	Pass	Compile
basic_comb	100%	100%
arith	100%	100%
bit_manip	100%	100%
sequential	40%	100%
fsm	33.33%	66.67%
memory	100%	100%

Load

python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v4-1-seq-bit-fsm-lora"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

Training

Dataset: data/v4_1_focus_seq_bit_fsm.jsonl

504 examples
1 epoch
LR 2e-5
max length 1024
LoRA r/alpha 32/64
trained from adapter_v4_full_from_base

Notes

v4.1 is current best production adapter in this project. v4.2 improved FSM but regressed overall functional score, so v4.1 remains recommended.

Benchmark charts

Note: these charts compare against local baselines and project adapters on the same internal paper-style Verilog benchmark. They are SOTA-style comparisons, not official VerilogEval/RTLLM leaderboard claims. Direct SOTA claims require running every external model on the exact same suite and decoding settings.

Full-suite comparison

Full suite comparison

Table with columns: Model, Compile, Task pass, Functional
Model	Compile	Task pass	Functional
Base Qwen2.5-Coder-7B-Instruct	80.00%	60.00%	59.09%
v1 strict	86.67%	60.00%	50.00%
v2 correction	90.00%	66.67%	59.09%
v4 full	93.33%	76.67%	68.18%

v4.1 category breakdown

v4.1 category pass rate

Relation to SOTA-style evaluation

Common Verilog/RTL LLM papers use VerilogEval-, RTLLM-, HDLBits-, or HumanEval-style methodology:

text
prompt -> generate RTL -> extract code -> compile with iverilog -> simulate with vvp/testbench -> pass@k

Typical reporting:

compile/syntax rate
functional simulation pass rate
pass@1
pass@5 / pass@10 / pass@20
category breakdown for combinational, arithmetic, sequential, FSM, memory, and larger RTL designs

External SOTA context

Important: the following charts are context only, not official leaderboard placement. External results come from the public Chip Design LLM Zoo table, while this adapter's score is from this repository's internal 30-task paper-style suite. Benchmarks, prompts, decoding settings, and contamination controls differ.

Sources:

Chip Design LLM Zoo reports VerilogEval, VerilogEval v2, RTLLM, and other RTL benchmark results and ranks models by VerilogEval pass@1 / RTLLM correct rate: https://iprc-dip.github.io/Chip-Design-LLM-Zoo/
NVIDIA's VerilogEval repository describes the VerilogEval harness and dataset: https://github.com/NVlabs/verilog-eval
The VerilogEval paper page describes a 156-problem Verilog generation benchmark: https://research.nvidia.com/publication/2023-09_verilogeval-evaluating-large-language-models-verilog-code-generation
RTL-Coder is an RTL-code-generation fine-tuning project with RTLCoder-Mistral inference scripts and synthetic RTL data flow: https://github.com/hkust-zhiyao/RTL-Coder

Reported top performers vs this adapter

External SOTA context

Reported fine-tuned/open Verilog models

Top fine-tuned context

Representative reported models found in the public tables/search results include:

Table with columns: Model / method, Notes
Model / method	Notes
ScaleRTL-32B / ScaleRTL†-32B	fine-tuned RTL model, strong VerilogEval reported scores
ChipSeek-R1	fine-tuned 7B RTL model, strong VerilogEval-Machine reported score
CodeV-CodeLlama / CodeV-DeepSeek / CodeV-CodeQwen	fine-tuned 6.7B/7B RTL/code models
DecoRTL-CodeV	RTL-specific model reported on VerilogEval-Human
VeriReason-Qwen2.5-7B	fine-tuned Qwen2.5 7B RTL model
RTL++ @ 200K Trained	fine-tuned 7B RTL model
RTLCoder-Mistral	RTL-Coder project model family
qwen3-32b-verilog-lora	Hugging Face Verilog LoRA adapter

Why this comparison is cautious

The current v4.1 adapter has been tested thoroughly on this repository's full internal suite:

text
30 tasks total
22 functional simulation tasks
8 compile-only / RTL-heavy tasks
pass@1 task pass: 83.33%
pass@5 task pass: 86.67%

But it has not yet been run directly on official VerilogEval or RTLLM. Fair SOTA comparison requires running all candidate models with the same:

text
benchmark version
prompt formatting
sampling temperature / k
max tokens
code extraction rules
iverilog/vvp version
timeout
contamination policy

Next recommended step: run this adapter on official VerilogEval and RTLLM harnesses, then replace the context charts with direct apples-to-apples results.