Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

About

This repository contains the VideoKR-Qwen3-VL-8B model presented in VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding (ICML 2026 Spotlight).

VideoKR-Qwen3-VL-8B is obtained through a standard SFT → GRPO pipeline on Qwen3-VL-8B-Instruct:

  1. Supervised fine-tuning on VideoKR-SFT-201K with CoT rationales → VideoKR-Qwen3-VL-8B-SFT
  2. GRPO reinforcement learning on VideoKR-RL-114K with verifiable rewards → this model

VideoKR is the first large-scale training corpus designed for knowledge- and reasoning-intensive video understanding, containing 315K video reasoning examples over 145K newly collected, CC-licensed expert-domain videos across 82 professional subjects.

Links

ResourceLink
Training dataminuzero/VideoKR-Train
Evaluation dataminuzero/VideoKR-Eval
SFT checkpoint (Qwen2.5-VL)minuzero/VideoKR-Qwen2.5-VL-7B-SFT
GRPO checkpoint (Qwen2.5-VL)minuzero/VideoKR-Qwen2.5-VL-7B
SFT checkpoint (Qwen3-VL)minuzero/VideoKR-Qwen3-VL-8B-SFT

Performance

Results with 128 input frames. Within the Qwen3-VL-8B group, bold = best, underline = second best.

ModelVideo-MMEMVBenchLongVBenchGeneral AvgVideoMMMUMMVUSciVidBenchVideoKR-EvalKnowledge Avg
Qwen3-VL-8B-Instruct68.267.961.665.961.859.633.439.048.5
OneThinker65.869.361.465.562.961.633.838.349.2
VideoAuto-R168.768.858.865.463.159.632.743.849.8
Qwen3-VL-8B-Thinking67.668.060.065.264.960.533.041.550.0
VideoKR (SFT + RL)67.867.061.565.463.064.832.845.351.5

VideoKR achieves the highest knowledge-intensive average (+3.0 over base, +1.5 over Qwen3-VL-8B-Thinking) among all Qwen3-VL-8B based methods, while maintaining competitive general video reasoning performance.

Evaluation

bash

cd /path/to/VideoKR/lmms_eval
conda activate videokr_eval
export CUDA_VISIBLE_DEVICES=0
export VIDEOKR_MODEL=minuzero/VideoKR-Qwen3-VL-8B
export TASKS=videokr_eval
export BATCH_SIZE=1
export RUN_NAME=videokr_eval
bash examples/models/videokr_vllm.sh

Citation

If you find VideoKR useful in your research, please cite our paper:

bibtex

@misc{fu2026videokrknowledgereasoningintensivevideo,
title={VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding},
author={Lin Fu and Zheyuan Yang and Yang Wang and Tingyu Song and Arman Cohan and Yilun Zhao},
year={2026},
eprint={2606.05259},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.05259},
}

Model provider

minuzero

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today