Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Model Description

SOLE-R1 predicts robot task progress from visual observations. Given a video and a task description, the model outputs a reasoning trace and a scalar progress estimate.

Expected output format:

markdown

<think>reasoning about task progress</think><answer>progress%</answer>

The progress estimate is intended to serve as a dense reward signal for robotic reinforcement learning, especially when manually engineered rewards are unavailable.

Quick Start

The recommended interface for inference is RewardGen:

markdown

# pip install -U rewardgen
from rewardgen import generate, video_plot
# test_videos provided at the github repo: https://github.com/Philip-MIT/rewardgen
video_paths = [
"test_videos/robosuite/lift/unsuccessful/robosuite_lift_episode_12_unsuccessful_max_reward_38.mp4"
]
task_description = "Pick up the cube from the table."
rewards, reasoning_traces = generate(
model="SOLE-R1",
task_description=task_description,
video_paths=video_paths,
view_type_per_video=["external and wrist"],
verbose=False,
)
print(rewards)
print(reasoning_traces)
# Plotting with show_reasoning_traces=True
output_sole = {"model": "SOLE-R1", "rewards": rewards[0], "reasoning_traces": reasoning_traces[0]}
video_plot(
outputs=[output_sole],
plot_save_path='model_outputs/sole-r1/robosuite/lift/unsuccessful/robosuite_lift_episode_12_unsuccessful_max_reward_38.mp4',
video_path=video_paths[0],
show_reasoning_traces=True,
task_description=task_description,
verbose=False
)

Optional pre-download:

markdown

from rewardgen.utils.model_utils import get_model_dir
get_model_dir("sole-r1")

Input Format

The model is trained to reason over robot task progress using prompts that include:

  • A robot task description
  • The first timestep progress, typically 0%
  • The previous timestep progress
  • Visual observations from the first, previous, and current timesteps
  • Multiple camera views when available, such as external and wrist cameras

Example task description:

markdown

Pick up the cube from the table.

Output Format

The expected output format is:

markdown

<think>[reasoning about visual task progress]</think><answer>[current task progress]%</answer>

Example:

markdown

<think>The gripper has moved closer to the cube but has not yet grasped or lifted it. This indicates incremental progress from the previous timestep.</think><answer>22%</answer>

Downstream systems should parse the numeric value inside <answer>...</answer> as the reward/progress estimate.

Training Data

The model was trained on the SOLE-R1-8B training dataset.

The dataset contains robot task progress examples with images, prompts, reasoning completions, and progress labels.

It also includes a diverse collection of general spatial and multi-frame temporal reasoning data (e.g., from SSR-CoT, SpatialVLM, Spot-the-diff, Embodied CoT, RoboVQA, Robo2VLM-Reasoning) to serve as a foundational layer of our training mixture.

The full dataset is approximately 2TB.

Streaming example:

markdown

from datasets import load_dataset
ds = load_dataset(
"Philip-MIT/sole_training_data",
split="train",
streaming=True,
)
for row in ds:
print(row)
break

Citation

BibTeX:

markdown

@misc{schroeder2026soler1,
title={SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot RL},
author={Philip Schroeder and Thomas Weng and Karl Schmeckpeper and Eric Rosen and Stephen Hart and Ondrej Biza},
year={2026},
eprint={2603.28730},
archivePrefix={arXiv},
primaryClass={cs.RO}
}

License

This repository is released under the MIT License.

Model provider

Philip-MIT

Model tree

Base

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today