qwen35-video-rm API & Inference Endpoint

Install

bash
pip install "transformers>=5" peft torch torchvision accelerate qwen-vl-utils[decord] flash-attn

Use as a reward function (RL)

python
from reward_model import VideoRewardModel
rm = VideoRewardModel.from_pretrained(".")          # base Qwen3.5-9B pulled from HF

# T2V: score a generated clip against its prompt
s = rm.score(prompt="a cat surfing a wave at sunset", video="sample.mp4")
# -> {"quality": 1.83, "alignment": 0.42}   (unbounded reals, higher=better)

# I2V: pass the conditioning frame
s = rm.score(prompt=..., video="sample.mp4", cond_image="cond_frame.jpg")

# scalar reward for RL (tune the head weights for your objective)
r = rm.reward(prompt, video, cond_image=None, w_quality=0.5, w_alignment=0.5)

Important (must match training)

Pointwise: score one video at a time → two scalars. It is not a pairwise A/B judge.
The bundled SCORE_INSTRUCTION, pool_mode=concat, num_frames=8, video_longest_edge=4e6 are the trained regime — do not change them at inference (the reward heads are tied to that prompt/pooling; changing it degrades scores). All are in rm_config.json.
Scores are unbounded; for RL, z-normalize per batch or per prompt as needed. The two heads are independent — weight them for your objective (quality vs prompt-faithfulness).

Files

adapter_model.safetensors, adapter_config.json — LoRA + reward heads (PEFT)
model_v2_qwen35.py — model class (Qwen35VideoRewardModel)
reward_model.py — inference/reward wrapper (VideoRewardModel.from_pretrained)
rm_config.json — trained regime + metadata

Install

bash
pip install "transformers>=5" peft torch torchvision accelerate qwen-vl-utils[decord] flash-attn

Use as a reward function (RL)

python
from reward_model import VideoRewardModel
rm = VideoRewardModel.from_pretrained(".")          # base Qwen3.5-9B pulled from HF

# T2V: score a generated clip against its prompt
s = rm.score(prompt="a cat surfing a wave at sunset", video="sample.mp4")
# -> {"quality": 1.83, "alignment": 0.42}   (unbounded reals, higher=better)

# I2V: pass the conditioning frame
s = rm.score(prompt=..., video="sample.mp4", cond_image="cond_frame.jpg")

# scalar reward for RL (tune the head weights for your objective)
r = rm.reward(prompt, video, cond_image=None, w_quality=0.5, w_alignment=0.5)

Important (must match training)

Pointwise: score one video at a time → two scalars. It is not a pairwise A/B judge.
The bundled SCORE_INSTRUCTION, pool_mode=concat, num_frames=8, video_longest_edge=4e6 are the trained regime — do not change them at inference (the reward heads are tied to that prompt/pooling; changing it degrades scores). All are in rm_config.json.
Scores are unbounded; for RL, z-normalize per batch or per prompt as needed. The two heads are independent — weight them for your objective (quality vs prompt-faithfulness).

Files

adapter_model.safetensors, adapter_config.json — LoRA + reward heads (PEFT)
model_v2_qwen35.py — model class (Qwen35VideoRewardModel)
reward_model.py — inference/reward wrapper (VideoRewardModel.from_pretrained)
rm_config.json — trained regime + metadata

qwen35-video-rm

README

Install

Use as a reward function (RL)

Important (must match training)

Files

Explore FriendliAI today

README

Install

Use as a reward function (RL)

Important (must match training)

Files