Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v30b-delta-distilled-lora API & Inference Endpoint

Important caveat

This adapter is not a clean zero-shot VerilogEval leaderboard model. It is a targeted/distilled research artifact: some training rows come from v29 selector outputs on VerilogEval prompts that passed compile+simulation. Use the reported VerilogEval score as an experiment result, not as a contamination-free leaderboard claim.

For general Verilog usefulness, also see the external paper-style/robust/alt evaluations below.

Results

VerilogEval v2 direct, spec-to-RTL, n=1, temperature 0

Model / system	Compile	Functional pass
v9 prior single adapter	—	67/156
v29 multi-adapter verifier selector	150/156	84/156
v30 unified single adapter	134/156	67/156
v30b delta-distilled single adapter	141/156	71/156

External/generalization checks

Benchmark	Compile	Functional/task pass
Paper-style full	30/30	26/30 task pass; 18/22 functional
Robust suite	14/15	6/10 functional
Alt suite	7/8	3/5 functional

These match the prior v9 baseline on these small external suites, while improving VerilogEval direct from 67 to 71 pass.

Training data mix

Dataset builder: scripts/build_v30b_delta_distill_dataset.py

Unique source counts:

17 delta wins: v9 failed, v29 selector passed.
84 total v29 selector passing outputs.
67 v9 passing outputs for retention.
382 clean/manual verified rows.
18 external paper-style functional rows.
316 small verified synthetic rows.

Default repeat weights:

text
delta wins:         80x
all selector pass:   4x
v9 pass retention:   6x
clean verified:      2x
external functional:10x
synthetic:           1x

Training used --drop-overlength; rows exceeding the training token limit were dropped instead of truncating Verilog.

Training hyperparameters

text
base model: Qwen/Qwen2.5-Coder-7B-Instruct
base adapter: adapter_v9_auto_distilled_direct
method: QLoRA/LoRA continuation
LoRA r: 16
LoRA alpha: 32
learning rate: 7e-7
epochs: 0.75
max length: 2048
batch size: 1
grad accum: 4
warmup steps: 40

Usage

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v30b-delta-distilled-lora"

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base,
    quantization_config=bnb,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

prompt = "Write module half_adder(input a, input b, output sum, output carry)."
messages = [
    {"role": "system", "content": "Return only complete synthesizable Verilog code. No explanation."},
    {"role": "user", "content": prompt},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=700, do_sample=False, pad_token_id=tok.eos_token_id)
print(tok.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Related artifacts

v29 multi-adapter verifier selector pipeline: higher VerilogEval score, but requires multiple adapters plus compile/simulation selection.
v30b: this repository, a single deployable PEFT LoRA adapter.

Intended use

Research and experimentation with Verilog RTL code generation. Always compile, simulate, lint, and review generated RTL before use.

verilog-qwen2.5-coder-7b-v30b-delta-distilled-lora

Get help setting up a custom Dedicated Endpoints.

README