mokau

Zero-To-CAD-Qwen3-VL-2B

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0
Table
ResourceLink
📄 PaperZero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data
📦 Zero-to-CAD 1M (full dataset)ADSKAILab/Zero-To-CAD-1m
📦 Zero-to-CAD 100K (curated subset)ADSKAILab/Zero-To-CAD-100k
🤖 Fine-tuned Model (this model)You are here
🗂️ CollectionADSKAILab/Zero-To-CAD

Model Description

This model is a fully fine-tuned Qwen3-VL-2B-Instruct that takes 8 rendered views of a 3D shape (4 front, 4 rear at 256×256) and generates executable CadQuery Python code that reproduces the geometry.

The model was trained entirely on synthetic data from Zero-to-CAD 1M (979,633 training samples) — no real-world CAD files were used.

Key Results

Table
BenchmarkSuccess RateMean IoUMedian IoUP90 IoU
Zero-to-CAD test82.1%0.7470.8470.999
ABC (out-of-distribution)61.0%0.3770.3030.854

Comparison with Baselines

Table
ModelZero-to-CAD SuccessZero-to-CAD Mean IoUABC SuccessABC Mean IoU
This model82.1%0.74761.0%0.377
GPT-5.2 High72.2%0.48566.2%0.344
GPT-5.2 Medium71.1%0.49562.6%0.346
Qwen3-VL-2B (base)6.6%0.1845.4%0.131

Quick Start

Inference

python

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from datasets import load_dataset
from PIL import Image
import io
model_name = "ADSKAILab/Zero-To-CAD-Qwen3-VL-2B"
model = Qwen3VLForConditionalGeneration.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
processor = AutoProcessor.from_pretrained(model_name)
# Load 8 rendered views from the dataset
ds = load_dataset("ADSKAILab/Zero-To-CAD-1m", split="train", streaming=True)
sample = next(iter(ds))
views = [
Image.open(io.BytesIO(sample[f"image_{i}"])) if isinstance(sample[f"image_{i}"], bytes)
else sample[f"image_{i}"]
for i in range(8)
]
# Or load 8 views from local files:
# views = [Image.open(f"view_{i}.png") for i in range(8)]
messages = [
{
"role": "system",
"content": "You are a CAD code assistant. Given multiple rendered views of a 3D shape, generate clean, well-structured CadQuery Python code that accurately reproduces the geometry."
},
{
"role": "user",
"content": [
*[{"type": "image", "image": view} for view in views],
{"type": "text", "text": "Generate CadQuery code for this shape."}
]
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=views, return_tensors="pt").to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=4096)
output_text = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output_text)

Execute the generated code

python

import cadquery as cq
exec(output_text)
# `result` contains the reconstructed CadQuery solid
# Export
cq.exporters.export(result, "output.step")
cq.exporters.export(result, "output.stl")

Training Details

Table
HyperparameterValue
Base modelQwen3-VL-2B-Instruct
Training modeFull fine-tuning
Max sequence length4,096 tokens
OptimizerAdamW
Learning rate1 × 10⁻⁴
Weight decay0.0
LR schedulerCosine
Warmup ratio0.03
Attention dropout0.1
GPUs16 × NVIDIA H100 80GB
Per-GPU batch size1
Effective batch size16
Epochs3
Precisionbfloat16
Distributed strategyDDP

Evaluation Protocol

  • Metric: Voxelized IoU at 64³ resolution between generated and ground-truth solids
  • Rotational alignment: Maximum IoU over 45° rotation increments
  • Success rate: Percentage of generations producing valid, executable CadQuery code

Intended Uses

  • Image-to-CAD reconstruction — reconstruct editable parametric CAD from rendered views
  • Research baseline — starting point for Image-to-Sequence CAD generation research
  • Integration — combine with rendering pipelines for end-to-end 3D reconstruction

Limitations

  • Trained on synthetic data only; may struggle with photorealistic or noisy inputs
  • Expects 8 clean rendered views at 256×256 — other configurations are untested
  • Outputs CadQuery code only; other CAD formats require post-processing
  • Complex multi-part assemblies may exceed the 4,096 token context window

Citation

If you use this model, please cite:

bibtex

@misc{ataei2026zerotocadagenticsynthesisinterpretable,
title={Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data},
author={Mohammadmehdi Ataei and Farzaneh Askari and Kamal Rahimi Malekshan and Pradeep Kumar Jayaraman},
year={2026},
eprint={2604.24479},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.24479}
}

License

This model is released under the Apache License 2.0.

Model provider

mokau

Model tree

Base

Qwen/Qwen3-VL-2B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today