Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0qwen35-4b-iconclass-grpo-v5ctl
GRPO reward-ablation checkpoint (Qwen3.5-4B-VL, from davanstrien/qwen35-4b-iconclass-vlm).
Part of an experiment testing whether a richer reward bundle beats plain hierarchical-F1
(gt_match) for iconclass classification.
- Reward config:
gt_matchonly (control) - Result (completeness-corrected H-F1, 40-image test): 63.7%
- Verdict: no improvement over plain
gt_match(all variants 61–64%, within n=40 noise). Reward tuning is not the lever — the model is capability-bound. The approach that worked is anchored fusion (seeqwen35-4b-iconclass-sft-brillfull).
Base: davanstrien/qwen35-4b-iconclass-vlm. Trained with Unsloth + TRL.
Model provider
davanstrien
Model tree
Base
davanstrien/qwen35-4b-iconclass-vlm
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information