michalsr
toolmerge-planner-grpo
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Quick start
python
from transformers import AutoProcessor, AutoModelForCausalLMprocessor = AutoProcessor.from_pretrained("michalsr/toolmerge-planner-grpo")model = AutoModelForCausalLM.from_pretrained("michalsr/toolmerge-planner-grpo",torch_dtype="bfloat16",)
To use inside ToolMerge, override the planner checkpoint at the CLI:
bash
toolmerge config=configs/m2m/qwen3_8.yaml \model.base=michalsr/toolmerge-planner-grpo
Training recipe
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen3-VL-8B-Instruct |
| Reward | frames_in_gt=1.0, consistency=1.0 |
| Training data | train_correct_uniform_8f_clip_max1.json (filtered M2M train split, ~1500 items) |
| Optimizer | paged_adamw_8bit, lr=1e-6, bf16 |
| Compute | 2 nodes × 4 GPUs |
| Step | global_step=50 |
| Framework | TRL 0.27.2, transformers 4.57.6, PyTorch 2.10.0 |
Full training config: training/configs/m2m_grpo.yaml
in the ToolMerge repo.
Citation
bibtex
@misc{shlapentokhrothman2026decomposingqueriestoolcalls,title = {Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval},author = {Michal Shlapentokh-Rothman and Prachi Garg and Yu-Xiong Wang and Derek Hoiem},year = {2026},eprint = {2605.23826},archivePrefix = {arXiv},primaryClass = {cs.CV},url = {https://arxiv.org/abs/2605.23826},}
Cite the GRPO method:
bibtex
@article{shao2024deepseekmath,title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},year = 2024,eprint = {arXiv:2402.03300},}
Code repo: https://github.com/michalsr/ToolMerge.
Model provider
michalsr
Model tree
Base
Qwen/Qwen3-VL-8B-Instruct
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information