Spreadsheet-RL

Spreadsheet-RL-4B

README

License: apache-2.0

News

2026-05-23: Released the Spreadsheet-RL-4B model checkpoint on Hugging Face at Spreadsheet-RL/Spreadsheet-RL-4B.

Model Details

Table with columns: Field, Value
Field	Value
Base model	`Qwen/Qwen3-4B-Thinking-2507`
Training method	GRPO with outcome-based rewards
Environment	Spreadsheet Gym with Microsoft Excel 365, spreadsheet-native tools, SandboxFusion code execution, and async Excel recalculation/reward service
Training data	Spreadsheet-RL training split: 5,928 filtered ExcelForum tasks
Evaluation	SpreadsheetBench and Domain-Spreadsheet
License	Apache-2.0, following the base model license

Training Configuration

For full details, please see the paper. The released 4B run uses:

Table with columns: Hyperparameter, Value
Hyperparameter	Value
Algorithm	GRPO; KL-regularized against a frozen reference model
Training steps	60
Prompt/response limits	4,096 / 27,648 tokens
Rollout sampling	temperature 0.6; top-p 0.95; top-k 20
Batching	64 prompts/step; 16 rollouts/prompt; 1,024 rollouts/step
Multi-turn caps	max assistant turns 20; max user turns 20; max tool-response length 8,192
Optimizer	AdamW; learning rate 1e-6; weight decay 0.01; betas (0.9, 0.999); grad clip 1.0
KL loss	low-var KL; coefficient 0.001

Results

Spreadsheet-RL improves the same 4B base model through spreadsheet-native interaction design, comprehensive tool access, and RL post-training.

Table with columns: Benchmark, Base, + Native Harness, + Full Tools, Spreadsheet-RL-4B
Benchmark	Base	+ Native Harness	+ Full Tools	Spreadsheet-RL-4B
SpreadsheetBench Pass@1	12.0	15.6	19.3	23.4

On Domain-Spreadsheet, Spreadsheet-RL improves overall Pass@1 from 8.4 to 17.2 over 1,660 evaluation rollouts.

Usage

Install the standard Transformers stack and load the checkpoint:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Spreadsheet-RL/Spreadsheet-RL-4B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For task evaluation and agent rollouts, use the full Spreadsheet-RL codebase with the released dataset and Spreadsheet Gym:

bash
hf download Spreadsheet-RL/Spreadsheet-RL --repo-type dataset --local-dir data
git clone https://github.com/Spreadsheet-RL/Spreadsheet-RL.git

The default training/evaluation harness is maintained in the code repository under configs/, scripts/, reward/, and verl/.

Citation

bibtex
@misc{chi2026spreadsheetrl,
  title         = {Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning},
  author        = {Banghao Chi and Yining Xie and Mingyuan Wu and Jingcheng Yang and Jize Jiang and Zhaoheng Li and Shengyi Qian and Minjia Zhang and Klara Nahrstedt and Rui Hou and Xiangjun Fan and Hanchao Yu},
  year          = {2026},
  eprint        = {2605.22642},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  doi           = {10.48550/arXiv.2605.22642},
  url           = {https://arxiv.org/abs/2605.22642}
}

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Spreadsheet-RL

Model Tree

Base

Qwen/Qwen3-4B-Thinking-2507

Fine-tuned

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

News

2026-05-23: Released the Spreadsheet-RL-4B model checkpoint on Hugging Face at Spreadsheet-RL/Spreadsheet-RL-4B.

Model Details

Table with columns: Field, Value
Field	Value
Base model	`Qwen/Qwen3-4B-Thinking-2507`
Training method	GRPO with outcome-based rewards
Environment	Spreadsheet Gym with Microsoft Excel 365, spreadsheet-native tools, SandboxFusion code execution, and async Excel recalculation/reward service
Training data	Spreadsheet-RL training split: 5,928 filtered ExcelForum tasks
Evaluation	SpreadsheetBench and Domain-Spreadsheet
License	Apache-2.0, following the base model license

Training Configuration

For full details, please see the paper. The released 4B run uses:

Table with columns: Hyperparameter, Value
Hyperparameter	Value
Algorithm	GRPO; KL-regularized against a frozen reference model
Training steps	60
Prompt/response limits	4,096 / 27,648 tokens
Rollout sampling	temperature 0.6; top-p 0.95; top-k 20
Batching	64 prompts/step; 16 rollouts/prompt; 1,024 rollouts/step
Multi-turn caps	max assistant turns 20; max user turns 20; max tool-response length 8,192
Optimizer	AdamW; learning rate 1e-6; weight decay 0.01; betas (0.9, 0.999); grad clip 1.0
KL loss	low-var KL; coefficient 0.001

Results

Spreadsheet-RL improves the same 4B base model through spreadsheet-native interaction design, comprehensive tool access, and RL post-training.

Table with columns: Benchmark, Base, + Native Harness, + Full Tools, Spreadsheet-RL-4B
Benchmark	Base	+ Native Harness	+ Full Tools	Spreadsheet-RL-4B
SpreadsheetBench Pass@1	12.0	15.6	19.3	23.4

On Domain-Spreadsheet, Spreadsheet-RL improves overall Pass@1 from 8.4 to 17.2 over 1,660 evaluation rollouts.

Usage

Install the standard Transformers stack and load the checkpoint:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Spreadsheet-RL/Spreadsheet-RL-4B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For task evaluation and agent rollouts, use the full Spreadsheet-RL codebase with the released dataset and Spreadsheet Gym:

bash
hf download Spreadsheet-RL/Spreadsheet-RL --repo-type dataset --local-dir data
git clone https://github.com/Spreadsheet-RL/Spreadsheet-RL.git

The default training/evaluation harness is maintained in the code repository under configs/, scripts/, reward/, and verl/.

Citation

bibtex
@misc{chi2026spreadsheetrl,
  title         = {Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning},
  author        = {Banghao Chi and Yining Xie and Mingyuan Wu and Jingcheng Yang and Jize Jiang and Zhaoheng Li and Shengyi Qian and Minjia Zhang and Klara Nahrstedt and Rui Hou and Xiangjun Fan and Hanchao Yu},
  year          = {2026},
  eprint        = {2605.22642},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  doi           = {10.48550/arXiv.2605.22642},
  url           = {https://arxiv.org/abs/2605.22642}
}