Philip-MIT

SOLE-R1-8B

Model Description

SOLE-R1 predicts robot task progress from visual observations. Given a video and a task description, the model outputs a reasoning trace and a scalar progress estimate.

Expected output format:

markdown
<think>reasoning about task progress</think><answer>progress%</answer>

The progress estimate is intended to serve as a dense reward signal for robotic reinforcement learning, especially when manually engineered rewards are unavailable.

Quick Start

The recommended interface for inference is RewardGen:

markdown
# pip install -U rewardgen

from rewardgen import generate, video_plot

# test_videos provided at the github repo: https://github.com/Philip-MIT/rewardgen
video_paths = [
    "test_videos/robosuite/lift/unsuccessful/robosuite_lift_episode_12_unsuccessful_max_reward_38.mp4"
]

task_description = "Pick up the cube from the table."

rewards, reasoning_traces = generate(
    model="SOLE-R1",
    task_description=task_description,
    video_paths=video_paths,
    view_type_per_video=["external and wrist"],
    verbose=False,
)
print(rewards)
print(reasoning_traces)

# Plotting with show_reasoning_traces=True
output_sole = {"model": "SOLE-R1", "rewards": rewards[0], "reasoning_traces": reasoning_traces[0]}
video_plot(
    outputs=[output_sole], 
    plot_save_path='model_outputs/sole-r1/robosuite/lift/unsuccessful/robosuite_lift_episode_12_unsuccessful_max_reward_38.mp4', 
    video_path=video_paths[0],
    show_reasoning_traces=True,
    task_description=task_description,
    verbose=False
)

Optional pre-download:

markdown
from rewardgen.utils.model_utils import get_model_dir

get_model_dir("sole-r1")

Input Format

The model is trained to reason over robot task progress using prompts that include:

A robot task description
The first timestep progress, typically 0%
The previous timestep progress
Visual observations from the first, previous, and current timesteps
Multiple camera views when available, such as external and wrist cameras

Example task description:

markdown
Pick up the cube from the table.

Output Format

The expected output format is:

markdown
<think>[reasoning about visual task progress]</think><answer>[current task progress]%</answer>

Example:

markdown
<think>The gripper has moved closer to the cube but has not yet grasped or lifted it. This indicates incremental progress from the previous timestep.</think><answer>22%</answer>

Downstream systems should parse the numeric value inside <answer>...</answer> as the reward/progress estimate.

Training Data

The model was trained on the SOLE-R1-8B training dataset.

The dataset contains robot task progress examples with images, prompts, reasoning completions, and progress labels.

It also includes a diverse collection of general spatial and multi-frame temporal reasoning data (e.g., from SSR-CoT, SpatialVLM, Spot-the-diff, Embodied CoT, RoboVQA, Robo2VLM-Reasoning) to serve as a foundational layer of our training mixture.

The full dataset is approximately 2TB.

Streaming example:

markdown
from datasets import load_dataset

ds = load_dataset(
    "Philip-MIT/sole_training_data",
    split="train",
    streaming=True,
)

for row in ds:
    print(row)
    break

Citation

BibTeX:

markdown
@misc{schroeder2026soler1,
  title={SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot RL},
  author={Philip Schroeder and Thomas Weng and Karl Schmeckpeper and Eric Rosen and Stephen Hart and Ondrej Biza},
  year={2026},
  eprint={2603.28730},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

License

This repository is released under the MIT License.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Philip-MIT

Model Tree

Base

this model

Input Modalities

TextImage

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

Model Description

SOLE-R1 predicts robot task progress from visual observations. Given a video and a task description, the model outputs a reasoning trace and a scalar progress estimate.

Expected output format:

markdown
<think>reasoning about task progress</think><answer>progress%</answer>

The progress estimate is intended to serve as a dense reward signal for robotic reinforcement learning, especially when manually engineered rewards are unavailable.

Quick Start

The recommended interface for inference is RewardGen:

markdown
# pip install -U rewardgen

from rewardgen import generate, video_plot

# test_videos provided at the github repo: https://github.com/Philip-MIT/rewardgen
video_paths = [
    "test_videos/robosuite/lift/unsuccessful/robosuite_lift_episode_12_unsuccessful_max_reward_38.mp4"
]

task_description = "Pick up the cube from the table."

rewards, reasoning_traces = generate(
    model="SOLE-R1",
    task_description=task_description,
    video_paths=video_paths,
    view_type_per_video=["external and wrist"],
    verbose=False,
)
print(rewards)
print(reasoning_traces)

# Plotting with show_reasoning_traces=True
output_sole = {"model": "SOLE-R1", "rewards": rewards[0], "reasoning_traces": reasoning_traces[0]}
video_plot(
    outputs=[output_sole], 
    plot_save_path='model_outputs/sole-r1/robosuite/lift/unsuccessful/robosuite_lift_episode_12_unsuccessful_max_reward_38.mp4', 
    video_path=video_paths[0],
    show_reasoning_traces=True,
    task_description=task_description,
    verbose=False
)

Optional pre-download:

markdown
from rewardgen.utils.model_utils import get_model_dir

get_model_dir("sole-r1")

Input Format

The model is trained to reason over robot task progress using prompts that include:

A robot task description
The first timestep progress, typically 0%
The previous timestep progress
Visual observations from the first, previous, and current timesteps
Multiple camera views when available, such as external and wrist cameras

Example task description:

markdown
Pick up the cube from the table.

Output Format

The expected output format is:

markdown
<think>[reasoning about visual task progress]</think><answer>[current task progress]%</answer>

Example:

markdown
<think>The gripper has moved closer to the cube but has not yet grasped or lifted it. This indicates incremental progress from the previous timestep.</think><answer>22%</answer>

Downstream systems should parse the numeric value inside <answer>...</answer> as the reward/progress estimate.

Training Data

The model was trained on the SOLE-R1-8B training dataset.

The dataset contains robot task progress examples with images, prompts, reasoning completions, and progress labels.

The full dataset is approximately 2TB.

Streaming example:

markdown
from datasets import load_dataset

ds = load_dataset(
    "Philip-MIT/sole_training_data",
    split="train",
    streaming=True,
)

for row in ds:
    print(row)
    break

Citation

BibTeX:

markdown
@misc{schroeder2026soler1,
  title={SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot RL},
  author={Philip Schroeder and Thomas Weng and Karl Schmeckpeper and Eric Rosen and Stephen Hart and Ondrej Biza},
  year={2026},
  eprint={2603.28730},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

License

This repository is released under the MIT License.

SOLE-R1-8B

README

Model Description

Quick Start

Input Format

Output Format

Training Data

Citation

License

Explore FriendliAI today

README

Model Description

Quick Start

Input Format

Output Format

Training Data

Citation

License