Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

News

Model Details

FieldValue
Base modelQwen/Qwen3-4B-Thinking-2507
Training methodGRPO with outcome-based rewards
EnvironmentSpreadsheet Gym with Microsoft Excel 365, spreadsheet-native tools, SandboxFusion code execution, and async Excel recalculation/reward service
Training dataSpreadsheet-RL training split: 5,928 filtered ExcelForum tasks
EvaluationSpreadsheetBench and Domain-Spreadsheet
LicenseApache-2.0, following the base model license

Training Configuration

For full details, please see the paper. The released 4B run uses:

HyperparameterValue
AlgorithmGRPO; KL-regularized against a frozen reference model
Training steps60
Prompt/response limits4,096 / 27,648 tokens
Rollout samplingtemperature 0.6; top-p 0.95; top-k 20
Batching64 prompts/step; 16 rollouts/prompt; 1,024 rollouts/step
Multi-turn capsmax assistant turns 20; max user turns 20; max tool-response length 8,192
OptimizerAdamW; learning rate 1e-6; weight decay 0.01; betas (0.9, 0.999); grad clip 1.0
KL losslow-var KL; coefficient 0.001
Actor update batchingmini-batch 32; dynamic batch sizing enabled
Hardware1 node x 4 NVIDIA H100 GPUs
Training timeabout 40 hours wall-clock for the 4B run

Results

Spreadsheet-RL improves the same 4B base model through spreadsheet-native interaction design, comprehensive tool access, and RL post-training.

BenchmarkBase+ Native Harness+ Full ToolsSpreadsheet-RL-4B
SpreadsheetBench Pass@112.015.619.323.4

On Domain-Spreadsheet, Spreadsheet-RL improves overall Pass@1 from 8.4 to 17.2 over 1,660 evaluation rollouts.

Usage

Install the standard Transformers stack and load the checkpoint:

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Spreadsheet-RL/Spreadsheet-RL-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)

For task evaluation and agent rollouts, use the full Spreadsheet-RL codebase with the released dataset and Spreadsheet Gym:

bash

hf download Spreadsheet-RL/Spreadsheet-RL --repo-type dataset --local-dir data
git clone https://github.com/Spreadsheet-RL/Spreadsheet-RL.git

The default training/evaluation harness is maintained in the code repository under configs/, scripts/, reward/, and verl/.

Citation

bibtex

@misc{chi2026spreadsheetrl,
title = {Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning},
author = {Banghao Chi and Yining Xie and Mingyuan Wu and Jingcheng Yang and Jize Jiang and Zhaoheng Li and Shengyi Qian and Minjia Zhang and Klara Nahrstedt and Rui Hou and Xiangjun Fan and Hanchao Yu},
year = {2026},
eprint = {2605.22642},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
doi = {10.48550/arXiv.2605.22642},
url = {https://arxiv.org/abs/2605.22642}
}

Model provider

Spreadsheet-RL

Model tree

Base

Qwen/Qwen3-4B-Thinking-2507

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today