Model description
The model translates a 256×256 semiconductor SEM image into a NetDSL-L2 program — a Domain-Specific Language describing Manhattan-routed circuit layouts as a sequence of CANVAS, WIRE, and VIA commands. Rendering the generated DSL reproduces the input geometry as a binary mask, enabling controlled augmentation, parameter editing, and downstream metrology.
- Base model:
Qwen/Qwen3-VL-8B-Instruct
- Fine-tuning: full SFT (vision encoder, multimodal projector, and language model are all trainable)
- Training data: 18,900 synthetic (image, DSL) pairs generated by the DSL renderer in
src/dsl2_dataset_v3.py; topology mixture of Vertical Stripes, Horizontal Lines, and Manhattan layouts
- Optimization: 3 epochs, batch size 8 × grad-accum 12 (effective 96), LR 2.0e-5 cosine + 10% warmup, weight decay 0.01, pure BF16 + gradient checkpointing
- Hardware: single NVIDIA H200 (141 GB)
- Final training loss: 0.2832
- Chat template:
qwen3_vl_nothink (no reasoning tokens emitted)
Intended use
Input a binarized (global threshold 100) SEM image of a circuit pattern; the model emits NetDSL-L2 code. Render the code with src/pattern_dsl.py from the companion repo to obtain a reconstructed binary mask.
Evaluation results (MIIC, 1034 test images)
Mean ± std over executable outputs. Binary input (proposed) ↔ Raw input (baseline):
Table with columns: Metric, Raw, Binary (ours)| Metric | Raw | Binary (ours) |
|---|
| IoU | 0.2865 ± 0.0802 | 0.3619 ± 0.0882 |
| Dice coefficient | 0.4393 ± 0.0980 | 0.5256 ± 0.0912 |
| BF1 @ 2 px | 0.4054 ± 0.1106 | 0.4412 ± 0.1098 |
| SkF1 @ 1 px | 0.1276 ± 0.0960 | 0.1746 ± 0.1145 |
| ASSD | 4.7768 ± 3.4596 | 4.1327 ± 1.2757 |
Executable rate: 957/1034 (binary) and 1019/1034 (raw).
Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch
REPO = "utsubo12/qwen3-vl-8b-netdsl-l2"
model = AutoModelForImageTextToText.from_pretrained(
REPO, torch_dtype=torch.bfloat16, device_map="cuda"
)
processor = AutoProcessor.from_pretrained(REPO)
image = Image.open("test_normal_00169_binary.png").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text",
"text": "Reconstruct this circuit pattern in NetDSL-L2."},
],
}]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt",
tokenize=True, return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
dsl_code = processor.batch_decode(
out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0]
print(dsl_code)
Render the predicted DSL with the helpers in the companion repo:
from src.pattern_dsl import parse_and_render
mask = parse_and_render(dsl_code, canvas=(256, 256))
NetDSL-L2 example
CANVAS 256 256
WIRE 0 12 8 0 0 0 0 H 256
WIRE 12 0 10 0 0 30 14 V 60
VIA 35 40 6
WIRE(x0, y0, w_base, l_s, w_s, l_e, w_e, segments) describes a wire with optional dogbone end-caps; segments is a chain of H <length> / V <length> relative moves. See §III-A of the paper.
Limitations
- Trained only on synthetic Manhattan-style layouts; non-Manhattan or analog layouts are out of distribution.
- At inference time, real SEM images must be binarized (global threshold ≈ 100) to obtain the reported numbers; raw grayscale input significantly degrades quality.
- The model produces NetDSL-L2 strings up to ~2048 tokens. Very dense layouts may be truncated.
- Reconstruction quality decreases as pattern complexity (compressed DSL length) grows; see Fig. 4 of the paper.
Citation
@inproceedings{ohtsubo2026bridging,
author = {Ohtsubo, Yusuke and Dohi, Kota and Yawata, Koichiro and
Takeshita, Koki and Sasaki, Tatsuya},
title = {Bridging the Sim-to-Real Gap in Semiconductor Visual Program
Synthesis via Input Binarization},
booktitle = {Proceedings of the 34th European Signal Processing Conference (EUSIPCO)},
year = {2026},
publisher = {EURASIP},
note = {Accepted; final citation/DOI to be updated upon publication}
}
License
MIT for both the code and these weights.
The base model Qwen3-VL-8B-Instruct is subject to its own license; please review.
Yusuke Ohtsubo — yusuke.ohtsubo.nb@hitachi.com