Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Usage

python

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"attashe/Bernini-MLLM-Qwen2.5-VL-7B", dtype="bfloat16", device_map="auto"
)
processor = AutoProcessor.from_pretrained("attashe/Bernini-MLLM-Qwen2.5-VL-7B")

Notes

  • Architecture: Qwen2_5_VLForConditionalGeneration (8.29B params), bfloat16.
  • These are ByteDance's fine-tuned Bernini planner weights; within the full Bernini pipeline the planner's hidden states feed a DiT renderer, so as a standalone chat/VL model its behaviour may differ from the base Qwen2.5-VL-7B-Instruct.
  • License: Apache-2.0, inherited from the upstream Bernini release.

Citation

bibtex

@article{bernini,
title = {Bernini: Latent Semantic Planning for Video Diffusion},
author = {Chenchen Liu and Junyi Chen and Lei Li and Lu Chi and Mingzhen Sun and Zhuoying Li and Yi Fu and Ruoyu Guo and Yiheng Wu and Ge Bai and Zehuan Yuan},
journal = {arXiv preprint arXiv:2605.22344},
year = {2026}
}

Model provider

attashe

attashe

Model tree

Base

Qwen/Qwen2.5-VL-7B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today