News
Model Details
Table with columns: Field, Value| Field | Value |
|---|
| Base model | Qwen/Qwen3-4B-Thinking-2507 |
| Training method | GRPO with outcome-based rewards |
| Environment | Spreadsheet Gym with Microsoft Excel 365, spreadsheet-native tools, SandboxFusion code execution, and async Excel recalculation/reward service |
| Training data | Spreadsheet-RL training split: 5,928 filtered ExcelForum tasks |
| Evaluation | SpreadsheetBench and Domain-Spreadsheet |
| License | Apache-2.0, following the base model license |
Training Configuration
For full details, please see the paper. The released 4B run uses:
Table with columns: Hyperparameter, Value| Hyperparameter | Value |
|---|
| Algorithm | GRPO; KL-regularized against a frozen reference model |
| Training steps | 60 |
| Prompt/response limits | 4,096 / 27,648 tokens |
| Rollout sampling | temperature 0.6; top-p 0.95; top-k 20 |
| Batching | 64 prompts/step; 16 rollouts/prompt; 1,024 rollouts/step |
| Multi-turn caps | max assistant turns 20; max user turns 20; max tool-response length 8,192 |
| Optimizer | AdamW; learning rate 1e-6; weight decay 0.01; betas (0.9, 0.999); grad clip 1.0 |
| KL loss | low-var KL; coefficient 0.001 |
Results
Spreadsheet-RL improves the same 4B base model through spreadsheet-native interaction design, comprehensive tool access, and RL post-training.
Table with columns: Benchmark, Base, + Native Harness, + Full Tools, Spreadsheet-RL-4B| Benchmark | Base | + Native Harness | + Full Tools | Spreadsheet-RL-4B |
|---|
| SpreadsheetBench Pass@1 | 12.0 | 15.6 | 19.3 | 23.4 |
On Domain-Spreadsheet, Spreadsheet-RL improves overall Pass@1 from 8.4 to 17.2 over 1,660 evaluation rollouts.
Usage
Install the standard Transformers stack and load the checkpoint:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Spreadsheet-RL/Spreadsheet-RL-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
For task evaluation and agent rollouts, use the full Spreadsheet-RL codebase with the released dataset and Spreadsheet Gym:
hf download Spreadsheet-RL/Spreadsheet-RL --repo-type dataset --local-dir data
git clone https://github.com/Spreadsheet-RL/Spreadsheet-RL.git
The default training/evaluation harness is maintained in the code repository under configs/, scripts/, reward/, and verl/.
Citation
@misc{chi2026spreadsheetrl,
title = {Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning},
author = {Banghao Chi and Yining Xie and Mingyuan Wu and Jingcheng Yang and Jize Jiang and Zhaoheng Li and Shengyi Qian and Minjia Zhang and Klara Nahrstedt and Rui Hou and Xiangjun Fan and Hanchao Yu},
year = {2026},
eprint = {2605.22642},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
doi = {10.48550/arXiv.2605.22642},
url = {https://arxiv.org/abs/2605.22642}
}