attashe

Bernini-MLLM-Qwen2.5-VL-7B

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Usage

python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "attashe/Bernini-MLLM-Qwen2.5-VL-7B", dtype="bfloat16", device_map="auto"
)
processor = AutoProcessor.from_pretrained("attashe/Bernini-MLLM-Qwen2.5-VL-7B")

Notes

Architecture: Qwen2_5_VLForConditionalGeneration (8.29B params), bfloat16.
These are ByteDance's fine-tuned Bernini planner weights; within the full Bernini pipeline the planner's hidden states feed a DiT renderer, so as a standalone chat/VL model its behaviour may differ from the base Qwen2.5-VL-7B-Instruct.
License: Apache-2.0, inherited from the upstream Bernini release.

Citation

bibtex
@article{bernini,
  title   = {Bernini: Latent Semantic Planning for Video Diffusion},
  author  = {Chenchen Liu and Junyi Chen and Lei Li and Lu Chi and Mingzhen Sun and Zhuoying Li and Yi Fu and Ruoyu Guo and Yiheng Wu and Ge Bai and Zehuan Yuan},
  journal = {arXiv preprint arXiv:2605.22344},
  year    = {2026}
}