Model Details
- Model type: no-reference video quality assessment vision-language model
- Checkpoint type: PEFT / LoRA adapter for active fine-tuning
- Backbone family: Qwen2.5-VL / VisualQuality-R1-style VLM
- Base model:
hollow404/VQR1-7B-YouTubeUGC
- LoRA rank: 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- Training data: YouTube-UGC + MDS-VQA-selected labeled samples from YouTube-SFV SDR
- Input: a video plus a VQA prompt
- Output: a quality score on a 1 to 5 scale, typically inside
<answer>...</answer> tags
- License: Apache 2.0
Intended Use
This model is intended for research on no-reference video quality assessment, active data selection, and target-domain adaptation for VQA. Typical uses include:
- evaluating the active fine-tuning stage of the MDS-VQA pipeline;
- predicting perceptual quality scores for YouTube-SFV SDR;
- comparing active fine-tuning against the YouTube-UGC baseline model;
- studying how model-informed data selection improves VQA generalization.
This checkpoint should be used together with the base model. It is not intended as a universal production QoE monitor without domain-specific validation.
The model follows the VisualQuality-R1-style scoring prompt used in MDS-VQA:
You are doing the video quality assessment task.
Here is the question: What is your overall rating on the quality of this video? The rating should be a float between 1 and 5, rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality.
First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags.
For automatic evaluation, parse the scalar value inside the final <answer> tag.
MDS-VQA Context
MDS-VQA is a model-informed data selection mechanism for VQA. Given an unlabeled target video pool, it selects videos that are both:
Difficult for the base VQA model: estimated by a failure predictor trained to rank videos by the base model's prediction errors.
Diverse in content: estimated from semantic video features, using a diversity-aware greedy selection procedure.
The selected videos are then labeled and merged with the original labeled source dataset for active fine-tuning. This repository provides the resulting active fine-tuning checkpoint.
Citation
If you use this model, please cite MDS-VQA:
@article{zou2026mds,
title={MDS-VQA: Model-Informed Data Selection for Video Quality Assessment},
author={Zou, Jian and Xu, Xiaoyu and Wang, Zhihua and Wang, Yilin and Adsumilli, Balu and Ma, Kede},
journal={arXiv preprint arXiv:2603.11525},
year={2026}
}