Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

About

This repository contains the VideoKR-Qwen2.5-VL-7B-SFT model presented in VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding (ICML 2026 Spotlight).

VideoKR-Qwen2.5-VL-7B-SFT is obtained by supervised fine-tuning Qwen2.5-VL-7B-Instruct on VideoKR-SFT-201K for one epoch. Each training example includes a high-quality chain-of-thought (CoT) rationale as the supervision target. This SFT checkpoint serves as the starting point for subsequent GRPO reinforcement learning, yielding the final VideoKR-Qwen2.5-VL-7B model.

Links

ResourceLink
Training dataminuzero/VideoKR-Train
Evaluation dataminuzero/VideoKR-Eval
GRPO checkpoint (Qwen2.5-VL)minuzero/VideoKR-Qwen2.5-VL-7B
SFT checkpoint (Qwen3-VL)minuzero/VideoKR-Qwen3-VL-8B-SFT
GRPO checkpoint (Qwen3-VL)minuzero/VideoKR-Qwen3-VL-8B

Training

For detailed training instructions, please refer to the GitHub repository.

bash

cd /path/to/VideoKR/llamafactory
conda activate videokr_train
# Prepare SFT data
mkdir -p data/raw
huggingface-cli download minuzero/VideoKR-Train \
--repo-type dataset --local-dir data/raw \
--include "VideoKR-COT-201K.jsonl"
python local_script/prepare_videokr_sft_data.py \
--input data/raw/VideoKR-COT-201K.jsonl \
--output data/videokr_train.json
# Launch SFT
bash local_script/train_videokr.sh qwen2_5vl

Citation

If you find VideoKR useful in your research, please cite our paper:

markdown

@misc{fu2026videokrknowledgereasoningintensivevideo,
title={VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding},
author={Lin Fu and Zheyuan Yang and Yang Wang and Tingyu Song and Arman Cohan and Yilun Zhao},
year={2026},
eprint={2606.05259},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.05259},
}

Model provider

minuzero

Model tree

Base

Qwen/Qwen2.5-VL-7B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today