minuzero/VideoKR-Qwen2.5-VL-7B-SFT API & Inference Endpoint

About

This repository contains the VideoKR-Qwen2.5-VL-7B-SFT model presented in VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding (ICML 2026 Spotlight).

VideoKR-Qwen2.5-VL-7B-SFT is obtained by supervised fine-tuning Qwen2.5-VL-7B-Instruct on VideoKR-SFT-201K for one epoch. Each training example includes a high-quality chain-of-thought (CoT) rationale as the supervision target. This SFT checkpoint serves as the starting point for subsequent GRPO reinforcement learning, yielding the final VideoKR-Qwen2.5-VL-7B model.

Links

Resource	Link
Training data	minuzero/VideoKR-Train
Evaluation data	minuzero/VideoKR-Eval
GRPO checkpoint (Qwen2.5-VL)	minuzero/VideoKR-Qwen2.5-VL-7B
SFT checkpoint (Qwen3-VL)	minuzero/VideoKR-Qwen3-VL-8B-SFT
GRPO checkpoint (Qwen3-VL)	minuzero/VideoKR-Qwen3-VL-8B

Training

For detailed training instructions, please refer to the GitHub repository.

bash
cd /path/to/VideoKR/llamafactory
conda activate videokr_train

# Prepare SFT data
mkdir -p data/raw
huggingface-cli download minuzero/VideoKR-Train \
  --repo-type dataset --local-dir data/raw \
  --include "VideoKR-COT-201K.jsonl"

python local_script/prepare_videokr_sft_data.py \
  --input data/raw/VideoKR-COT-201K.jsonl \
  --output data/videokr_train.json

# Launch SFT
bash local_script/train_videokr.sh qwen2_5vl

Citation

If you find VideoKR useful in your research, please cite our paper:

markdown
@misc{fu2026videokrknowledgereasoningintensivevideo,
      title={VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding}, 
      author={Lin Fu and Zheyuan Yang and Yang Wang and Tingyu Song and Arman Cohan and Yilun Zhao},
      year={2026},
      eprint={2606.05259},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.05259}, 
}

VideoKR-Qwen2.5-VL-7B-SFT

Get help setting up a custom Dedicated Endpoints.

README

About

Links

Training

Citation

Explore FriendliAI today

VideoKR-Qwen2.5-VL-7B-SFT