qgfvadfuvads
qwen35-video-rm
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Install
bash
pip install "transformers>=5" peft torch torchvision accelerate qwen-vl-utils[decord] flash-attn
Use as a reward function (RL)
python
from reward_model import VideoRewardModelrm = VideoRewardModel.from_pretrained(".") # base Qwen3.5-9B pulled from HF# T2V: score a generated clip against its prompts = rm.score(prompt="a cat surfing a wave at sunset", video="sample.mp4")# -> {"quality": 1.83, "alignment": 0.42} (unbounded reals, higher=better)# I2V: pass the conditioning frames = rm.score(prompt=..., video="sample.mp4", cond_image="cond_frame.jpg")# scalar reward for RL (tune the head weights for your objective)r = rm.reward(prompt, video, cond_image=None, w_quality=0.5, w_alignment=0.5)
Important (must match training)
- Pointwise: score one video at a time → two scalars. It is not a pairwise A/B judge.
- The bundled
SCORE_INSTRUCTION,pool_mode=concat,num_frames=8,video_longest_edge=4e6are the trained regime — do not change them at inference (the reward heads are tied to that prompt/pooling; changing it degrades scores). All are inrm_config.json. - Scores are unbounded; for RL, z-normalize per batch or per prompt as needed. The two heads are independent — weight them for your objective (quality vs prompt-faithfulness).
Files
adapter_model.safetensors,adapter_config.json— LoRA + reward heads (PEFT)model_v2_qwen35.py— model class (Qwen35VideoRewardModel)reward_model.py— inference/reward wrapper (VideoRewardModel.from_pretrained)rm_config.json— trained regime + metadata
Model provider
qgfvadfuvads
Model tree
Base
Qwen/Qwen3.5-9B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information