Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Important caveat

This is not a clean zero-shot leaderboard model. The training mix includes benchmark-targeted verified outputs and distillation anchors from earlier adapters/pipelines. Treat the scores below as experiment results, not contamination-free leaderboard claims.

LoRA weights were not transferred from Qwen2.5-Coder. This adapter was trained directly on Qwen3.5-9B using verified examples/behavior from v9/v30b/v29/v31.

Results

VerilogEval v2 direct, spec-to-RTL, n=1, temperature 0

Model / systemCompileFunctional pass
v9 prior single adapter67/156
v30b best Qwen2.5-Coder single adapter141/15671/156
v29 multi-adapter verifier selector150/15684/156
v32 Qwen3.5-9B migration71/15660/156

v32 underperformed as a single adapter, mainly because Qwen3.5 often produced long reasoning or malformed final code. However, it had 12 functional wins over v30b, making it useful as a diversity/teacher checkpoint.

Training data mix

Dataset builder: scripts/build_v32_qwen35_migration_dataset.py

Unique source counts:

  • 67 v9 pass anchors.
  • 71 v30b pass anchors.
  • 17 v9-fail/v29-pass delta wins.
  • 67 selector retention rows.
  • 35 external/general rows.
  • 382 clean verified rows.
  • 316 synthetic verified rows.

Default repeat weights:

text

v9 pass anchor: 14x
v30b pass anchor: 14x
delta wins: 45x
selector retention: 4x
external general: 20x
clean retention: 3x
synthetic: 1x

Training used --drop-overlength; overlength rows were dropped, not truncated.

Training hyperparameters

text

base model: Qwen/Qwen3.5-9B
method: QLoRA/LoRA
LoRA r: 32
LoRA alpha: 64
learning rate: 1e-5
epochs: 0.80
max length: 1536
batch size: 1
grad accum: 4
warmup steps: 40

Usage

Qwen3.5 uses a conditional-generation loader in the current Transformers stack.

python

import torch
from transformers import AutoTokenizer, AutoModelForImageTextToText, BitsAndBytesConfig
from peft import PeftModel
base = "Qwen/Qwen3.5-9B"
adapter = "Pablo-Flores-Mollinedo/verilog-qwen3.5-9b-v32-migration-lora"
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
tok = AutoTokenizer.from_pretrained(adapter, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
base,
quantization_config=bnb,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
prompt = "Write module TopModule(input a, input b, output out); out should be a & b."
messages = [
{"role": "system", "content": "You are a Verilog RTL designer. Return synthesizable Verilog."},
{"role": "user", "content": prompt},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False, pad_token_id=tok.eos_token_id)
print(tok.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Related artifacts

  • v33 Qwen3.5 thinking-reinforced LoRA: stronger follow-up, 76/156 VerilogEval pass.
  • v30b Qwen2.5-Coder LoRA: prior best single adapter, 71/156 VerilogEval pass.
  • v29 verifier selector: best practical pipeline, 84/156 pass.

Intended use

Research and experimentation with Verilog RTL generation. Always compile, simulate, lint, and review generated RTL before use.

Model provider

Pablo-Flores-Mollinedo

Model tree

Base

Qwen/Qwen3.5-9B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today