Niraya666

Qwen3-VL-4B-Instruct-WMVLM-RL-0213

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model Summary

WaferSAGE-RL is initialized from a WaferSAGE supervised fine-tuned Qwen3-VL model and further optimized with curriculum-based reinforcement learning using rubric-aligned rewards.

The model is designed for semiconductor wafer map understanding, especially:

Defect type identification
Spatial distribution analysis
Morphological description
Multi-modal defect pattern interpretation
Root-cause candidate generation
Hallucination reduction through rubric-based penalties

Table with columns: Item, Description
Item	Description
Project	WaferSAGE
Model Type	Vision-Language Model
Base / Starting Model	WaferSAGE-SFT Qwen3-VL
Post-training Method	GSPO / GRPO-style reinforcement learning
Reward	Rubric-aligned rule-based reward
Task	Image-text-to-text / Visual Question Answering
Domain	Semiconductor wafer map defect analysis
Language	English

Why Reinforcement Learning?

Supervised fine-tuning helps the model learn wafermap-specific language and response structure. However, SFT alone may still produce:

Overly generic descriptions
Missing key defect locations
Hallucinated clock positions or wafer zones
Incorrect morphology terms
Plausible but unsupported root-cause explanations

WaferSAGE-RL uses structured rubrics as reward signals. These rubrics include both positive criteria and negative criteria:

text
must-hit:
  Terms or concepts the answer should include.

must-avoid:
  Incorrect locations, defect types, or morphology terms that should be penalized.

This encourages the model to cover important visual evidence while avoiding unsupported or hallucinated statements.

Post-training Pipeline

text
WaferSAGE SFT model
    ↓
Rubric-augmented VQA data
    ↓
Curriculum ordering by difficulty
    ↓
Multiple sampled completions per prompt
    ↓
Rubric-based reward scoring
    ↓
GSPO / GRPO-style policy optimization
    ↓
WaferSAGE-RL model

Curriculum Learning

The RL dataset combines previously seen SFT-style examples and additional rubric-augmented examples. Training examples are ordered from easier to harder samples, so the model first stabilizes on simpler recognition and localization tasks before moving to more complex multi-defect reasoning and root-cause hypothesis questions.

Example difficulty categories:

Easy: direct defect type or location questions
Medium: morphology and distribution questions
Hard: multi-modal defect interpretation and root-cause hypothesis questions

Reward Design

The reward is based on rubric matching across major dimensions:

Table with columns: Dimension, Reward Signal
Dimension	Reward Signal
Spatial	Reward correct zones, quadrants, clock positions, edge/center descriptions; penalize wrong locations.
Morphological	Reward correct terms such as ring, scratch, cluster, blob, random, dense, annular; penalize contradictory patterns.
Root Cause	Reward plausible process or equipment hypotheses; penalize unsupported or contradictory causes.

Root-cause rewards should be interpreted carefully: wafer maps alone cannot prove true process root cause. This dimension is intended to encourage plausible candidate explanations, not definitive diagnosis.

Usage with Transformers

python
from transformers import pipeline

pipe = pipeline(
    "image-text-to-text",
    model="Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213"
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "path_or_url_to_wafermap.png"},
            {"type": "text", "text": "Describe the wafer map defect pattern and possible root-cause hypotheses."}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=512)
print(output)

Usage with vLLM

bash
vllm serve "Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213"

Then call the OpenAI-compatible endpoint:

bash
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Analyze this wafer map. Describe the spatial distribution, morphology, and possible root-cause hypotheses."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "path_or_url_to_wafermap.png"
            }
          }
        ]
      }
    ],
    "max_tokens": 512
  }'

Suggested Prompt Template

For best results, ask the model to separate visual evidence from root-cause hypotheses:

text
Analyze this wafer map.

Please answer in three parts:
1. Spatial distribution: where are the defective dies located?
2. Morphology: what pattern, shape, density, or structure is visible?
3. Root-cause candidates: what process or equipment issues could plausibly explain this pattern?

Do not claim a definitive root cause unless there is enough process context.

Evaluation

WaferSAGE-RL was evaluated using a rubric-based wafermap VQA benchmark and LLM-as-a-Judge scoring.

The benchmark evaluates:

Spatial understanding
Morphological understanding
Root-cause candidate quality
Defect type identification
Hallucination avoidance

In the WaferSAGE paper, the 4B Qwen3-VL RL model achieved a 6.493 LLM-Judge score, closely approaching Gemini-3-Flash at 7.149, while supporting fully local deployment.

Strengths

Domain-adapted to wafer map visual understanding.
Better at wafermap-specific terminology than generic VLMs.
Supports local deployment for privacy-sensitive semiconductor environments.
Uses rubric-guided RL to improve spatial and morphological answer quality.
Useful for research prototypes and engineering assistant workflows.

Limitations

This model has important limitations:

It does not perform definitive root-cause diagnosis.
Root-cause outputs are candidate hypotheses only.
It does not use lot history, tool/chamber logs, recipe parameters, metrology, inline inspection, or CP/FT data.
It may still hallucinate plausible but unsupported process causes.
It may overfit to synthetic rubric style and public wafermap data distributions.
It should not be used for automated process control or high-cost manufacturing decisions without expert review.

Recommended Use

Recommended:

Wafer map VQA research
Industrial VLM evaluation
Semiconductor AI assistant prototypes
Local on-premise proof-of-concept deployments
Synthetic-data and rubric-reward research

Not recommended:

Final process root-cause diagnosis
Production yield disposition without human review
Automated manufacturing control
Safety-critical or high-cost decisions without validation

WaferSAGE paper: arXiv:2604.27629
SFT starting model: Niraya666/Qwen3-4B-wmvlm-260204
WaferSAGE VQA dataset: Niraya666/wafermap-vqa-2602
Rubric datasets: Niraya666/wafermap-vqa-with-rubrics-2602, Niraya666/WaferSAGE-Wafermap-VQA-Dataset

Citation

If you use this model, please cite:

bibtex
@misc{xu2026wafersage,
  title        = {WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning},
  author       = {Ke Xu and Zhongyuan Lian},
  year         = {2026},
  eprint       = {2604.27629},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI}
}

Model provider

Niraya666

Model tree

Base

Niraya666/Qwen3-4B-wmvlm-260204

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model Summary

WaferSAGE-RL is initialized from a WaferSAGE supervised fine-tuned Qwen3-VL model and further optimized with curriculum-based reinforcement learning using rubric-aligned rewards.

The model is designed for semiconductor wafer map understanding, especially:

Defect type identification
Spatial distribution analysis
Morphological description
Multi-modal defect pattern interpretation
Root-cause candidate generation
Hallucination reduction through rubric-based penalties

Table with columns: Item, Description
Item	Description
Project	WaferSAGE
Model Type	Vision-Language Model
Base / Starting Model	WaferSAGE-SFT Qwen3-VL
Post-training Method	GSPO / GRPO-style reinforcement learning
Reward	Rubric-aligned rule-based reward
Task	Image-text-to-text / Visual Question Answering
Domain	Semiconductor wafer map defect analysis
Language	English

Why Reinforcement Learning?

Supervised fine-tuning helps the model learn wafermap-specific language and response structure. However, SFT alone may still produce:

Overly generic descriptions
Missing key defect locations
Hallucinated clock positions or wafer zones
Incorrect morphology terms
Plausible but unsupported root-cause explanations

WaferSAGE-RL uses structured rubrics as reward signals. These rubrics include both positive criteria and negative criteria:

text
must-hit:
  Terms or concepts the answer should include.

must-avoid:
  Incorrect locations, defect types, or morphology terms that should be penalized.

This encourages the model to cover important visual evidence while avoiding unsupported or hallucinated statements.

Post-training Pipeline

text
WaferSAGE SFT model
    ↓
Rubric-augmented VQA data
    ↓
Curriculum ordering by difficulty
    ↓
Multiple sampled completions per prompt
    ↓
Rubric-based reward scoring
    ↓
GSPO / GRPO-style policy optimization
    ↓
WaferSAGE-RL model

Curriculum Learning

Example difficulty categories:

Easy: direct defect type or location questions
Medium: morphology and distribution questions
Hard: multi-modal defect interpretation and root-cause hypothesis questions

Reward Design

The reward is based on rubric matching across major dimensions:

Table with columns: Dimension, Reward Signal
Dimension	Reward Signal
Spatial	Reward correct zones, quadrants, clock positions, edge/center descriptions; penalize wrong locations.
Morphological	Reward correct terms such as ring, scratch, cluster, blob, random, dense, annular; penalize contradictory patterns.
Root Cause	Reward plausible process or equipment hypotheses; penalize unsupported or contradictory causes.

Usage with Transformers

python
from transformers import pipeline

pipe = pipeline(
    "image-text-to-text",
    model="Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213"
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "path_or_url_to_wafermap.png"},
            {"type": "text", "text": "Describe the wafer map defect pattern and possible root-cause hypotheses."}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=512)
print(output)

Usage with vLLM

bash
vllm serve "Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213"

Then call the OpenAI-compatible endpoint:

bash
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "Niraya666/Qwen3-VL-4B-Instruct-WMVLM-RL-0213",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Analyze this wafer map. Describe the spatial distribution, morphology, and possible root-cause hypotheses."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "path_or_url_to_wafermap.png"
            }
          }
        ]
      }
    ],
    "max_tokens": 512
  }'

Suggested Prompt Template

For best results, ask the model to separate visual evidence from root-cause hypotheses:

text
Analyze this wafer map.

Please answer in three parts:
1. Spatial distribution: where are the defective dies located?
2. Morphology: what pattern, shape, density, or structure is visible?
3. Root-cause candidates: what process or equipment issues could plausibly explain this pattern?

Do not claim a definitive root cause unless there is enough process context.

Evaluation

WaferSAGE-RL was evaluated using a rubric-based wafermap VQA benchmark and LLM-as-a-Judge scoring.

The benchmark evaluates:

Spatial understanding
Morphological understanding
Root-cause candidate quality
Defect type identification
Hallucination avoidance

In the WaferSAGE paper, the 4B Qwen3-VL RL model achieved a 6.493 LLM-Judge score, closely approaching Gemini-3-Flash at 7.149, while supporting fully local deployment.

Strengths

Domain-adapted to wafer map visual understanding.
Better at wafermap-specific terminology than generic VLMs.
Supports local deployment for privacy-sensitive semiconductor environments.
Uses rubric-guided RL to improve spatial and morphological answer quality.
Useful for research prototypes and engineering assistant workflows.

Limitations

This model has important limitations:

It does not perform definitive root-cause diagnosis.
Root-cause outputs are candidate hypotheses only.
It does not use lot history, tool/chamber logs, recipe parameters, metrology, inline inspection, or CP/FT data.
It may still hallucinate plausible but unsupported process causes.
It may overfit to synthetic rubric style and public wafermap data distributions.
It should not be used for automated process control or high-cost manufacturing decisions without expert review.

Recommended Use

Recommended:

Wafer map VQA research
Industrial VLM evaluation
Semiconductor AI assistant prototypes
Local on-premise proof-of-concept deployments
Synthetic-data and rubric-reward research

Not recommended:

Final process root-cause diagnosis
Production yield disposition without human review
Automated manufacturing control
Safety-critical or high-cost decisions without validation

WaferSAGE paper: arXiv:2604.27629
SFT starting model: Niraya666/Qwen3-4B-wmvlm-260204
WaferSAGE VQA dataset: Niraya666/wafermap-vqa-2602
Rubric datasets: Niraya666/wafermap-vqa-with-rubrics-2602, Niraya666/WaferSAGE-Wafermap-VQA-Dataset

Citation

If you use this model, please cite:

bibtex
@misc{xu2026wafersage,
  title        = {WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning},
  author       = {Ke Xu and Zhongyuan Lian},
  year         = {2026},
  eprint       = {2604.27629},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI}
}

Qwen3-VL-4B-Instruct-WMVLM-RL-0213

Get help setting up a custom Dedicated Endpoints.

README

Model Summary

Why Reinforcement Learning?

Post-training Pipeline

Curriculum Learning

Reward Design

Usage with Transformers

Usage with vLLM

Suggested Prompt Template

Evaluation

Strengths

Limitations

Recommended Use

Citation

Explore FriendliAI today

README

Model Summary

Why Reinforcement Learning?

Post-training Pipeline

Curriculum Learning

Reward Design

Usage with Transformers

Usage with vLLM

Suggested Prompt Template

Evaluation

Strengths

Limitations

Recommended Use

Citation

Qwen3-VL-4B-Instruct-WMVLM-RL-0213

Get help setting up a custom Dedicated Endpoints.

Model Summary

Why Reinforcement Learning?

Post-training Pipeline

Curriculum Learning

Reward Design

Usage with Transformers

Usage with vLLM

Suggested Prompt Template

Evaluation

Strengths

Limitations

Recommended Use

Related Resources

Citation

Explore FriendliAI today

Model Summary

Why Reinforcement Learning?

Post-training Pipeline

Curriculum Learning

Reward Design

Usage with Transformers

Usage with vLLM

Suggested Prompt Template

Evaluation

Strengths

Limitations

Recommended Use

Related Resources

Citation