Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Highlights

  • Universal: a single model for text / image / video / visual-document embeddings.
  • Latent reasoning: fewer than 10 latent steps replace hundreds of generated CoT tokens, giving >30× faster inference than explicit-CoT UME at comparable or better quality.
  • Strong retrieval: evaluated on the 78-task MMEB-v2 benchmark, outperforming strong explicit-CoT UME baselines — especially where evidence is dense and structurally complex (video and visual-document retrieval).

Model details

  • Backbone: zhibinlan/UME-R1-7B (Qwen2-VL-7B, Qwen2VLForConditionalGeneration)
  • Parameters: ~7B, weights in half precision (4 safetensors shards, ~17 GB)
  • License: Apache-2.0

Usage

The weights load as a standard Qwen2-VL checkpoint:

python

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
model = Qwen2VLForConditionalGeneration.from_pretrained(
"Rem520/PLUME-7B", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("Rem520/PLUME-7B")

To use the full PLUME embedding pipeline (latent rollout + semantic-anchor-guided transition adapter), follow the official code: https://github.com/haoxiangzhao12138/PLUME

Citation

bibtex

@article{he2026plume,
title = {PLUME: Latent Reasoning Based Universal Multimodal Embedding},
author = {He, Chenwei and Hao, Xiangzhao and Yang, Tianyu and Ma, Yuxiang and
Jia, Yuheng and Wu, Lingxiang and Zhao, Chaoyang and Guo, Haiyun and Wang, Jinqiao},
journal = {arXiv preprint arXiv:2604.02073},
year = {2026}
}

Model provider

Rem520

Model tree

Base

zhibinlan/UME-R1-7B

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today