davanstrien

qwen35-4b-iconclass-grpo-v5full

README

License: apache-2.0

qwen35-4b-iconclass-grpo-v5full

GRPO reward-ablation checkpoint (Qwen3.5-4B-VL, from davanstrien/qwen35-4b-iconclass-vlm). Part of an experiment testing whether a richer reward bundle beats plain hierarchical-F1 (gt_match) for iconclass classification.

Reward config: recall + validity + count + diversity
Result (completeness-corrected H-F1, 40-image test): 61.8%
Verdict: no improvement over plain gt_match (all variants 61–64%, within n=40 noise). Reward tuning is not the lever — the model is capability-bound. The approach that worked is anchored fusion (see qwen35-4b-iconclass-sft-brillfull).

Base: davanstrien/qwen35-4b-iconclass-vlm. Trained with Unsloth + TRL.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

davanstrien

Model Tree

Base

davanstrien/qwen35-4b-iconclass-vlm

Fine-tuned

this model

Input Modalities

Text

Image

Video

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

qwen35-4b-iconclass-grpo-v5full

Reward config: recall + validity + count + diversity
Result (completeness-corrected H-F1, 40-image test): 61.8%
Verdict: no improvement over plain gt_match (all variants 61–64%, within n=40 noise). Reward tuning is not the lever — the model is capability-bound. The approach that worked is anchored fusion (see qwen35-4b-iconclass-sft-brillfull).

Base: davanstrien/qwen35-4b-iconclass-vlm. Trained with Unsloth + TRL.