Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Input format

Provide one chat message with 10 images sampled at 5 FPS. Each image should be preceded by a text label:

text

Frame F00: <image>
Frame F01: <image>
...
Frame F09: <image>

The frame labels are text anchors in the message, not labels rendered into the image pixels.

Output format

The model emits only a JSON array:

json

[
{"frame": "F02", "type": "MouseMove", "details": "120,45"},
{"frame": "F03", "type": "MouseClick", "details": "Left"},
{"frame": "F05", "type": "KeyPress", "details": "Cmd+S"},
{"frame": "F07", "type": "MouseScroll", "details": "-150"}
]

Action types:

  • KeyPress: key name with modifiers, e.g. Cmd+S, Enter, A
  • MouseClick: Left, Right, or Middle
  • MouseMove: normalized dx,dy, where 1000 is full screen width/height
  • MouseScroll: normalized signed scroll magnitude

Frame attribution: if an effect first appears between F_K and F_{K+1}, report the action on F_K, the last pre-action frame.

Training and evaluation

  • Base model: Qwen/Qwen3-VL-8B-Instruct
  • Data: macOS crowd-cast paired screencasts and OS input logs
  • Training: LoRA on language and vision modules, merged after 5,000 steps
  • Eval: 44 manually verified macOS productivity clips
  • Result: F1 0.86, MouseMove R² 0.66, MouseMove cosine 0.99

Limitations

The model was trained on macOS productivity recordings. It can confuse OS-specific shortcuts such as Cmd vs Ctrl, and it only predicts actions that are visible or inferable from screen pixels at 5 FPS.

Model provider

p-doom

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today