Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Intended format
Prompt the model to reason briefly, then emit final Verilog between tags:
text
Thinking:[BEGIN]module TopModule(...);...endmodule[DONE]
Use code inside [BEGIN] / [DONE] as final artifact.
Evaluation in this project
Local VerilogEval v2 spec-to-RTL direct single-adapter eval, no retry, no selector, no compiler feedback before final answer:
text
v33 max4096: compile 113/156, pass 80/156 = 51.3%v33 max1024: compile 101/156, pass 76/156 = 48.7%
These are clean direct adapter results, not agentic pipeline results. The project best practical pipeline remains a separate verifier-selector system.
Loading
python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelbase = "Qwen/Qwen3.5-9B"adapter = "Pablo-Flores-Mollinedo/verilog-qwen3.5-9b-v33-thinking-reinforced-lora"tok = AutoTokenizer.from_pretrained(adapter, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True, device_map="auto")model = PeftModel.from_pretrained(model, adapter)
Notes
- Adapter only; requires the base model license/weights.
- Generated RTL should be compiled and simulated before use.
- Benchmark scores are finite-testbench simulation results, not formal proof.
Model provider
Pablo-Flores-Mollinedo
Model tree
Base
Qwen/Qwen3.5-9B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information