xupy21

ContextRL_Qwen3_VL_8B

README

License: apache-2.0

Results

Across 12 diverse multimodal benchmarks, ContextRL improves over the standard GRPO baseline by +1.6 points on average, while improving every individual benchmark.

Table with columns: Benchmark, Base, RL (GRPO), ContextRL (Ours)
Benchmark	Base	RL (GRPO)	ContextRL (Ours)
MathVista	75.8	78.7	79.8
MathVerse	56.1	65.0	66.4
MathVision	46.2	49.2	52.0
MMMU-Pro	41.3	55.9	57.5
MMMU	66.4	69.1	70.1
V*	82.2	84.8	85.9
MMStar	70.5	73.5	74.8
BLINK	64.4	65.1	66.6
ScienceQA	94.4	95.6	96.6
PhyX	45.5	72.1	73.4
OlympiadBench Phy	7.9	8.1	9.9
MME-RealWorld Lite	48.7	51.9	54.8
Overall Avg.	58.3	64.1	65.7

Usage

This model follows the same interface as Qwen3-VL-8B-Instruct and can be loaded with transformers. Training and evaluation code, data construction pipelines, and detailed configurations are available in the repository:

👉 https://github.com/xupy2003/ContextAwareRL

Please refer to the repo's README for environment setup, inference scripts, and reproduction instructions.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

xupy21

Model Tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Input Modalities

Text

Image

Output Modalities