minuzero

VideoKR-Qwen3-VL-8B

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

About

This repository contains the VideoKR-Qwen3-VL-8B model presented in VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding (ICML 2026 Spotlight).

VideoKR-Qwen3-VL-8B is obtained through a standard SFT → GRPO pipeline on Qwen3-VL-8B-Instruct:

Supervised fine-tuning on VideoKR-SFT-201K with CoT rationales → VideoKR-Qwen3-VL-8B-SFT
GRPO reinforcement learning on VideoKR-RL-114K with verifiable rewards → this model

VideoKR is the first large-scale training corpus designed for knowledge- and reasoning-intensive video understanding, containing 315K video reasoning examples over 145K newly collected, CC-licensed expert-domain videos across 82 professional subjects.

Links

Table with columns: Resource, Link
Resource	Link
Training data	minuzero/VideoKR-Train
Evaluation data	minuzero/VideoKR-Eval
SFT checkpoint (Qwen2.5-VL)	minuzero/VideoKR-Qwen2.5-VL-7B-SFT
GRPO checkpoint (Qwen2.5-VL)	minuzero/VideoKR-Qwen2.5-VL-7B
SFT checkpoint (Qwen3-VL)	minuzero/VideoKR-Qwen3-VL-8B-SFT

Performance

Results with 128 input frames. Within the Qwen3-VL-8B group, bold = best, underline = second best.

Table with columns: Model, Video-MME, MVBench, LongVBench, General Avg, VideoMMMU, MMVU, SciVidBench, VideoKR-Eval, Knowledge Avg
Model	Video-MME	MVBench	LongVBench	General Avg	VideoMMMU	MMVU	SciVidBench	VideoKR-Eval	Knowledge Avg
Qwen3-VL-8B-Instruct	68.2	67.9	61.6	65.9	61.8	59.6	33.4	39.0	48.5

VideoKR achieves the highest knowledge-intensive average (+3.0 over base, +1.5 over Qwen3-VL-8B-Thinking) among all Qwen3-VL-8B based methods, while maintaining competitive general video reasoning performance.

Evaluation

bash
cd /path/to/VideoKR/lmms_eval
conda activate videokr_eval

export CUDA_VISIBLE_DEVICES=0
export VIDEOKR_MODEL=minuzero/VideoKR-Qwen3-VL-8B
export TASKS=videokr_eval
export BATCH_SIZE=1
export RUN_NAME=videokr_eval

bash examples/models/videokr_vllm.sh

Citation

If you find VideoKR useful in your research, please cite our paper:

bibtex
@misc{fu2026videokrknowledgereasoningintensivevideo,
      title={VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding}, 
      author={Lin Fu and Zheyuan Yang and Yang Wang and Tingyu Song and Arman Cohan and Yilun Zhao},
      year={2026},
      eprint={2606.05259},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.05259}, 
}

Model provider

minuzero

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

About

This repository contains the VideoKR-Qwen3-VL-8B model presented in VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding (ICML 2026 Spotlight).

VideoKR-Qwen3-VL-8B is obtained through a standard SFT → GRPO pipeline on Qwen3-VL-8B-Instruct:

Supervised fine-tuning on VideoKR-SFT-201K with CoT rationales → VideoKR-Qwen3-VL-8B-SFT
GRPO reinforcement learning on VideoKR-RL-114K with verifiable rewards → this model

Links

Table with columns: Resource, Link
Resource	Link
Training data	minuzero/VideoKR-Train
Evaluation data	minuzero/VideoKR-Eval
SFT checkpoint (Qwen2.5-VL)	minuzero/VideoKR-Qwen2.5-VL-7B-SFT
GRPO checkpoint (Qwen2.5-VL)	minuzero/VideoKR-Qwen2.5-VL-7B
SFT checkpoint (Qwen3-VL)	minuzero/VideoKR-Qwen3-VL-8B-SFT

Performance

Results with 128 input frames. Within the Qwen3-VL-8B group, bold = best, underline = second best.

Table with columns: Model, Video-MME, MVBench, LongVBench, General Avg, VideoMMMU, MMVU, SciVidBench, VideoKR-Eval, Knowledge Avg
Model	Video-MME	MVBench	LongVBench	General Avg	VideoMMMU	MMVU	SciVidBench	VideoKR-Eval	Knowledge Avg
Qwen3-VL-8B-Instruct	68.2	67.9	61.6	65.9	61.8	59.6	33.4	39.0	48.5

VideoKR achieves the highest knowledge-intensive average (+3.0 over base, +1.5 over Qwen3-VL-8B-Thinking) among all Qwen3-VL-8B based methods, while maintaining competitive general video reasoning performance.

Evaluation

bash
cd /path/to/VideoKR/lmms_eval
conda activate videokr_eval

export CUDA_VISIBLE_DEVICES=0
export VIDEOKR_MODEL=minuzero/VideoKR-Qwen3-VL-8B
export TASKS=videokr_eval
export BATCH_SIZE=1
export RUN_NAME=videokr_eval

bash examples/models/videokr_vllm.sh

Citation

If you find VideoKR useful in your research, please cite our paper:

bibtex
@misc{fu2026videokrknowledgereasoningintensivevideo,
      title={VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding}, 
      author={Lin Fu and Zheyuan Yang and Yang Wang and Tingyu Song and Arman Cohan and Yilun Zhao},
      year={2026},
      eprint={2606.05259},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.05259}, 
}

VideoKR-Qwen3-VL-8B

Get help setting up a custom Dedicated Endpoints.

README

About

Links

Performance

Evaluation

Citation

Explore FriendliAI today

README

About

Links

Performance

Evaluation

Citation