Model Summary
WaferSAGE-SFT is fine-tuned from Qwen3-VL on synthetic wafer map VQA data. The model is designed to answer natural language questions about wafer map images, including defect type identification, spatial distribution analysis, morphology description, and root-cause hypothesis generation.
Table with columns: Item, Description| Item | Description |
|---|
| Project | WaferSAGE |
| Model Type | Vision-Language Model |
| Base Model | Qwen3-VL Instruct |
| Fine-tuning Method | LoRA-SFT |
| Task | Image-text-to-text / Visual Question Answering |
| Domain | Semiconductor wafer map defect analysis |
| Language | English |
Intended Capabilities
The model can answer questions such as:
- What type of defect pattern is visible on this wafer map?
- Where are the defective dies located?
- Is the defect concentrated near the center, edge, or a specific quadrant?
- Does the wafer show a scratch-like, ring-like, clustered, or random pattern?
- What process or equipment issue might be associated with this defect pattern?
Example Prompts
<image>
What type of defect pattern is visible on this wafer map?
<image>
Where are the defects located on the wafer?
<image>
Describe the morphology and spatial distribution of this wafer map defect.
<image>
What are the possible root-cause hypotheses for this defect pattern?
Training Data
This model was trained on WaferSAGE wafermap VQA data, generated through a multi-stage synthetic data pipeline:
Wafer map images + labels
↓
Clustering-based cleaning and sampling
↓
VLM-generated defect descriptions
↓
Structured rubric extraction
↓
VQA synthesis
↓
LoRA supervised fine-tuning
The VQA data covers:
- Defect type identification
- Spatial distribution
- Morphological description
- Root-cause hypothesis
- Consistency verification
Training Setup
The SFT models were trained with LoRA adaptation on Qwen3-VL.
Typical configuration:
Table with columns: Hyperparameter, Value| Hyperparameter | Value |
|---|
| Fine-tuning method | LoRA |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0 |
| Optimizer | AdamW 8-bit |
| Learning rate | 2e-4 |
| Scheduler | Linear |
| Epochs | 1 |
| Max context length | 2048 |
from transformers import pipeline
pipe = pipeline(
"image-text-to-text",
model="Niraya666/Qwen3-4B-wmvlm-260204"
)
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "path_or_url_to_wafermap.png"},
{"type": "text", "text": "What type of defect pattern is visible on this wafer map?"}
]
}
]
output = pipe(text=messages, max_new_tokens=256)
print(output)
Usage with vLLM
vllm serve "Niraya666/Qwen3-4B-wmvlm-260204"
Then call the OpenAI-compatible endpoint:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Niraya666/Qwen3-4B-wmvlm-260204",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe the wafer map defect pattern."
},
{
"type": "image_url",
"image_url": {
"url": "path_or_url_to_wafermap.png"
}
}
]
}
],
"max_tokens": 256
}'
Evaluation
The model was evaluated using WaferSAGE's rubric-based wafermap VQA benchmark.
Evaluation dimensions include:
- Spatial understanding
- Morphological understanding
- Defect type recognition
- Root-cause hypothesis quality
- Hallucination avoidance
The evaluation combines:
- Rule-based rubric matching
- LLM-as-a-Judge scoring
- Qualitative error analysis
The SFT model primarily improves domain-specific terminology, response format, and wafermap-specific visual reasoning compared with the base VLM. For stronger performance, see the WaferSAGE RL models trained with GSPO and rubric-aligned rewards.
Limitations
This model is a domain-adapted VLM for wafer map understanding, but it has important limitations:
- It should not be used as a standalone root-cause diagnosis system.
- Root-cause outputs are hypotheses, not verified fab conclusions.
- It may hallucinate defect locations or process causes when the image is ambiguous.
- It may inherit biases from synthetic training data and teacher model outputs.
- It was trained primarily on public wafer map style data and may not generalize to all fab-specific wafer map formats.
- It does not use lot history, process metadata, tool/chamber records, metrology, or inline inspection data.
For production semiconductor engineering use, model outputs should be reviewed by qualified engineers and combined with process context.
Recommended Use
Recommended:
- Research on industrial VLMs
- Wafer map VQA experiments
- Defect pattern description
- Data generation and evaluation pipeline development
- Local proof-of-concept systems for semiconductor AI
Not recommended:
- Automated process control
- Final root-cause diagnosis
- Yield-impact decisions without expert review
- Safety-critical or high-cost manufacturing decisions without validation
- WaferSAGE paper: arXiv:2604.27629
- WaferSAGE VQA dataset:
Niraya666/wafermap-vqa-2602
- Rubric-augmented datasets:
Niraya666/wafermap-vqa-with-rubrics-2602, Niraya666/wafermap-vqa-with-rubrics-2602_v2
- RL model:
Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213
Citation
If you use this model, please cite:
@misc{xu2026wafersage,
title = {WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning},
author = {Ke Xu and Zhongyuan Lian},
year = {2026},
eprint = {2604.27629},
archivePrefix= {arXiv},
primaryClass = {cs.AI}
}