Model Summary
WaferSAGE-RL is initialized from a WaferSAGE supervised fine-tuned Qwen3-VL model and further optimized with curriculum-based reinforcement learning using rubric-aligned rewards.
The model is designed for semiconductor wafer map understanding, especially:
- Defect type identification
- Spatial distribution analysis
- Morphological description
- Multi-modal defect pattern interpretation
- Root-cause candidate generation
- Hallucination reduction through rubric-based penalties
Table with columns: Item, Description| Item | Description |
|---|
| Project | WaferSAGE |
| Model Type | Vision-Language Model |
| Base / Starting Model | WaferSAGE-SFT Qwen3-VL |
| Post-training Method | GSPO / GRPO-style reinforcement learning |
| Reward | Rubric-aligned rule-based reward |
| Task | Image-text-to-text / Visual Question Answering |
| Domain | Semiconductor wafer map defect analysis |
| Language | English |
Why Reinforcement Learning?
Supervised fine-tuning helps the model learn wafermap-specific language and response structure. However, SFT alone may still produce:
- Overly generic descriptions
- Missing key defect locations
- Hallucinated clock positions or wafer zones
- Incorrect morphology terms
- Plausible but unsupported root-cause explanations
WaferSAGE-RL uses structured rubrics as reward signals. These rubrics include both positive criteria and negative criteria:
must-hit:
Terms or concepts the answer should include.
must-avoid:
Incorrect locations, defect types, or morphology terms that should be penalized.
This encourages the model to cover important visual evidence while avoiding unsupported or hallucinated statements.
Post-training Pipeline
WaferSAGE SFT model
↓
Rubric-augmented VQA data
↓
Curriculum ordering by difficulty
↓
Multiple sampled completions per prompt
↓
Rubric-based reward scoring
↓
GSPO / GRPO-style policy optimization
↓
WaferSAGE-RL model
Curriculum Learning
The RL dataset combines previously seen SFT-style examples and additional rubric-augmented examples. Training examples are ordered from easier to harder samples, so the model first stabilizes on simpler recognition and localization tasks before moving to more complex multi-defect reasoning and root-cause hypothesis questions.
Example difficulty categories:
- Easy: direct defect type or location questions
- Medium: morphology and distribution questions
- Hard: multi-modal defect interpretation and root-cause hypothesis questions
Reward Design
The reward is based on rubric matching across major dimensions:
Table with columns: Dimension, Reward Signal| Dimension | Reward Signal |
|---|
| Spatial | Reward correct zones, quadrants, clock positions, edge/center descriptions; penalize wrong locations. |
| Morphological | Reward correct terms such as ring, scratch, cluster, blob, random, dense, annular; penalize contradictory patterns. |
| Root Cause | Reward plausible process or equipment hypotheses; penalize unsupported or contradictory causes. |
Root-cause rewards should be interpreted carefully: wafer maps alone cannot prove true process root cause. This dimension is intended to encourage plausible candidate explanations, not definitive diagnosis.
from transformers import pipeline
pipe = pipeline(
"image-text-to-text",
model="Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213"
)
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "path_or_url_to_wafermap.png"},
{"type": "text", "text": "Describe the wafer map defect pattern and possible root-cause hypotheses."}
]
}
]
output = pipe(text=messages, max_new_tokens=512)
print(output)
Usage with vLLM
vllm serve "Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213"
Then call the OpenAI-compatible endpoint:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze this wafer map. Describe the spatial distribution, morphology, and possible root-cause hypotheses."
},
{
"type": "image_url",
"image_url": {
"url": "path_or_url_to_wafermap.png"
}
}
]
}
],
"max_tokens": 512
}'
Suggested Prompt Template
For best results, ask the model to separate visual evidence from root-cause hypotheses:
Analyze this wafer map.
Please answer in three parts:
1. Spatial distribution: where are the defective dies located?
2. Morphology: what pattern, shape, density, or structure is visible?
3. Root-cause candidates: what process or equipment issues could plausibly explain this pattern?
Do not claim a definitive root cause unless there is enough process context.
Evaluation
WaferSAGE-RL was evaluated using a rubric-based wafermap VQA benchmark and LLM-as-a-Judge scoring.
The benchmark evaluates:
- Spatial understanding
- Morphological understanding
- Root-cause candidate quality
- Defect type identification
- Hallucination avoidance
In the WaferSAGE paper, the 4B Qwen3-VL RL model achieved a 6.493 LLM-Judge score, closely approaching Gemini-3-Flash at 7.149, while supporting fully local deployment.
Strengths
- Domain-adapted to wafer map visual understanding.
- Better at wafermap-specific terminology than generic VLMs.
- Supports local deployment for privacy-sensitive semiconductor environments.
- Uses rubric-guided RL to improve spatial and morphological answer quality.
- Useful for research prototypes and engineering assistant workflows.
Limitations
This model has important limitations:
- It does not perform definitive root-cause diagnosis.
- Root-cause outputs are candidate hypotheses only.
- It does not use lot history, tool/chamber logs, recipe parameters, metrology, inline inspection, or CP/FT data.
- It may still hallucinate plausible but unsupported process causes.
- It may overfit to synthetic rubric style and public wafermap data distributions.
- It should not be used for automated process control or high-cost manufacturing decisions without expert review.
Recommended Use
Recommended:
- Wafer map VQA research
- Industrial VLM evaluation
- Semiconductor AI assistant prototypes
- Local on-premise proof-of-concept deployments
- Synthetic-data and rubric-reward research
Not recommended:
- Final process root-cause diagnosis
- Production yield disposition without human review
- Automated manufacturing control
- Safety-critical or high-cost decisions without validation
- WaferSAGE paper: arXiv:2604.27629
- SFT starting model:
Niraya666/Qwen3-4B-wmvlm-260204
- WaferSAGE VQA dataset:
Niraya666/wafermap-vqa-2602
- Rubric datasets:
Niraya666/wafermap-vqa-with-rubrics-2602, Niraya666/WaferSAGE-Wafermap-VQA-Dataset
Citation
If you use this model, please cite:
@misc{xu2026wafersage,
title = {WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning},
author = {Ke Xu and Zhongyuan Lian},
year = {2026},
eprint = {2604.27629},
archivePrefix= {arXiv},
primaryClass = {cs.AI}
}