Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Model description

The model translates a 256×256 semiconductor SEM image into a NetDSL-L2 program — a Domain-Specific Language describing Manhattan-routed circuit layouts as a sequence of CANVAS, WIRE, and VIA commands. Rendering the generated DSL reproduces the input geometry as a binary mask, enabling controlled augmentation, parameter editing, and downstream metrology.

  • Base model: Qwen/Qwen3-VL-8B-Instruct
  • Fine-tuning: full SFT (vision encoder, multimodal projector, and language model are all trainable)
  • Training data: 18,900 synthetic (image, DSL) pairs generated by the DSL renderer in src/dsl2_dataset_v3.py; topology mixture of Vertical Stripes, Horizontal Lines, and Manhattan layouts
  • Optimization: 3 epochs, batch size 8 × grad-accum 12 (effective 96), LR 2.0e-5 cosine + 10% warmup, weight decay 0.01, pure BF16 + gradient checkpointing
  • Hardware: single NVIDIA H200 (141 GB)
  • Final training loss: 0.2832
  • Chat template: qwen3_vl_nothink (no reasoning tokens emitted)

Intended use

Input a binarized (global threshold 100) SEM image of a circuit pattern; the model emits NetDSL-L2 code. Render the code with src/pattern_dsl.py from the companion repo to obtain a reconstructed binary mask.

Evaluation results (MIIC, 1034 test images)

Mean ± std over executable outputs. Binary input (proposed) ↔ Raw input (baseline):

MetricRawBinary (ours)
IoU0.2865 ± 0.08020.3619 ± 0.0882
Dice coefficient0.4393 ± 0.09800.5256 ± 0.0912
BF1 @ 2 px0.4054 ± 0.11060.4412 ± 0.1098
SkF1 @ 1 px0.1276 ± 0.09600.1746 ± 0.1145
ASSD4.7768 ± 3.45964.1327 ± 1.2757

Executable rate: 957/1034 (binary) and 1019/1034 (raw).

Usage

python

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch
REPO = "utsubo12/qwen3-vl-8b-netdsl-l2"
model = AutoModelForImageTextToText.from_pretrained(
REPO, torch_dtype=torch.bfloat16, device_map="cuda"
)
processor = AutoProcessor.from_pretrained(REPO)
# Real SEM image, globally thresholded at 100 (binary preprocessing).
image = Image.open("test_normal_00169_binary.png").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text",
"text": "Reconstruct this circuit pattern in NetDSL-L2."},
],
}]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt",
tokenize=True, return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
dsl_code = processor.batch_decode(
out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0]
print(dsl_code)

Render the predicted DSL with the helpers in the companion repo:

python

from src.pattern_dsl import parse_and_render
mask = parse_and_render(dsl_code, canvas=(256, 256))

NetDSL-L2 example

markdown

CANVAS 256 256
WIRE 0 12 8 0 0 0 0 H 256
WIRE 12 0 10 0 0 30 14 V 60
VIA 35 40 6

WIRE(x0, y0, w_base, l_s, w_s, l_e, w_e, segments) describes a wire with optional dogbone end-caps; segments is a chain of H <length> / V <length> relative moves. See §III-A of the paper.

Limitations

  • Trained only on synthetic Manhattan-style layouts; non-Manhattan or analog layouts are out of distribution.
  • At inference time, real SEM images must be binarized (global threshold ≈ 100) to obtain the reported numbers; raw grayscale input significantly degrades quality.
  • The model produces NetDSL-L2 strings up to ~2048 tokens. Very dense layouts may be truncated.
  • Reconstruction quality decreases as pattern complexity (compressed DSL length) grows; see Fig. 4 of the paper.

Citation

bibtex

@inproceedings{ohtsubo2026bridging,
author = {Ohtsubo, Yusuke and Dohi, Kota and Yawata, Koichiro and
Takeshita, Koki and Sasaki, Tatsuya},
title = {Bridging the Sim-to-Real Gap in Semiconductor Visual Program
Synthesis via Input Binarization},
booktitle = {Proceedings of the 34th European Signal Processing Conference (EUSIPCO)},
year = {2026},
publisher = {EURASIP},
note = {Accepted; final citation/DOI to be updated upon publication}
}

License

MIT for both the code and these weights.

The base model Qwen3-VL-8B-Instruct is subject to its own license; please review.

Contact

Yusuke Ohtsubo — yusuke.ohtsubo.nb@hitachi.com

Model provider

utsubo12

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today