Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Highlights

  • Unified embodied capability system. A single 8B model unifies three capability dimensions: Cognition & Spatial Reasoning, Planning & Correction, and Pointing & Location.
  • State-of-the-art performance. Achieves SOTA on 16 out of 24 embodied VLM benchmarks, with an average score of 70.4% across 21 main accuracy-based benchmarks, surpassing Gemini-Robotics-ER-1.5 and GPT-5.4 by 17.0% and 21.7% respectively.
  • Closed-loop autonomy. The PGC framework lets one model serve as planner, grounder, and corrector simultaneously, completing long-horizon real-world tasks (e.g., making milk tea, sweeping garbage, stacking cups) without human intervention.
  • Efficient adaptation to action. Because embodied reasoning is internalized upstream, the model can be fine-tuned into Embodied-R1.5-VLA with only a small amount of action data, outperforming strong VLA baselines such as ฯ€0.5โ€‹ across 4 popular manipulation benchmark suites (e.g., 92.4% on SimplerEnv Google Robot Visual Matching).
  • Fully open-source. We release model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks.

Model Details

  • Architecture: Qwen3-VL (Qwen3VLForConditionalGeneration)
  • Parameters: ~8B
  • Modality: Image / Video + Text โ†’ Text
  • Output format: All outputs are plain-text token sequences. Coordinates are normalized to [0,1000], trajectories are ordered coordinate sequences, and reasoning is free-form text. The final decision is emitted within an <answer>...</answer> tag.

Unified Capabilities

  1. Embodied Cognition & Spatial Reasoning โ€” comprehends the semantic and spatial structure of the physical world, including static geometric relations and dynamic interaction possibilities.
  2. Embodied Planning & Correction โ€” covers the full task life cycle: long-horizon task decomposition, next-step planning, process detection, error localization, and error correction.
  3. Embodied Pointing & Location โ€” grounds high-level reasoning in coordinates and trajectories, covering referring expression grounding, region-level localization, functional (affordance) grounding, and visual trace generation.

Quick Start

python

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
model_id = "IffYuan/Embodied-R1.5"
model = AutoModelForImageTextToText.from_pretrained(
model_id, torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)
image = Image.open("scene.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "You are a robot performing manipulation tasks. "
"The task instruction is: move the blue cube on top of the yellow cube. "
"Use 2D points to mark the target location."},
],
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

The model reasons over the visual observation and emits its final decision within an <answer> tag, e.g. <answer>[{"point_2d": [750, 748]}]</answer>.

Citation

bibtex

@article{yuan2026embodiedr15,
title = {Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models},
author = {Yuan, Yifu and others},
year = {2026}
}

License

Released under the Apache 2.0 license.

Model provider

IffYuan

IffYuan

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today