Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
- Universal: a single model for text / image / video / visual-document embeddings.
- Latent reasoning: fewer than 10 latent steps replace hundreds of generated CoT tokens, giving >30× faster inference than explicit-CoT UME at comparable or better quality.
- Strong retrieval: evaluated on the 78-task MMEB-v2 benchmark, outperforming strong explicit-CoT UME baselines — especially where evidence is dense and structurally complex (video and visual-document retrieval).
Model details
- Backbone:
zhibinlan/UME-R1-7B(Qwen2-VL-7B,Qwen2VLForConditionalGeneration) - Parameters: ~7B, weights in half precision (4 safetensors shards, ~17 GB)
- License: Apache-2.0
Usage
The weights load as a standard Qwen2-VL checkpoint:
python
from transformers import AutoProcessor, Qwen2VLForConditionalGenerationmodel = Qwen2VLForConditionalGeneration.from_pretrained("Rem520/PLUME-7B", torch_dtype="auto", device_map="auto")processor = AutoProcessor.from_pretrained("Rem520/PLUME-7B")
To use the full PLUME embedding pipeline (latent rollout + semantic-anchor-guided transition adapter), follow the official code: https://github.com/haoxiangzhao12138/PLUME
Citation
bibtex
@article{he2026plume,title = {PLUME: Latent Reasoning Based Universal Multimodal Embedding},author = {He, Chenwei and Hao, Xiangzhao and Yang, Tianyu and Ma, Yuxiang andJia, Yuheng and Wu, Lingxiang and Zhao, Chaoyang and Guo, Haiyun and Wang, Jinqiao},journal = {arXiv preprint arXiv:2604.02073},year = {2026}}
- Paper: arXiv:2604.02073
- Code: github.com/haoxiangzhao12138/PLUME
Model provider
Rem520
Model tree
Base
zhibinlan/UME-R1-7B
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information