michalsr

toolmerge-planner-grpo

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Quick start

python

from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("michalsr/toolmerge-planner-grpo")
model = AutoModelForCausalLM.from_pretrained(
"michalsr/toolmerge-planner-grpo",
torch_dtype="bfloat16",
)

To use inside ToolMerge, override the planner checkpoint at the CLI:

bash

toolmerge config=configs/m2m/qwen3_8.yaml \
model.base=michalsr/toolmerge-planner-grpo

Training recipe

Table
SettingValue
Base modelQwen/Qwen3-VL-8B-Instruct
Rewardframes_in_gt=1.0, consistency=1.0
Training datatrain_correct_uniform_8f_clip_max1.json (filtered M2M train split, ~1500 items)
Optimizerpaged_adamw_8bit, lr=1e-6, bf16
Compute2 nodes × 4 GPUs
Stepglobal_step=50
FrameworkTRL 0.27.2, transformers 4.57.6, PyTorch 2.10.0

Full training config: training/configs/m2m_grpo.yaml in the ToolMerge repo.

Citation

bibtex

@misc{shlapentokhrothman2026decomposingqueriestoolcalls,
title = {Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval},
author = {Michal Shlapentokh-Rothman and Prachi Garg and Yu-Xiong Wang and Derek Hoiem},
year = {2026},
eprint = {2605.23826},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2605.23826},
}

Cite the GRPO method:

bibtex

@article{shao2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
}

Code repo: https://github.com/michalsr/ToolMerge.

Model provider

michalsr

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today