Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

About

SOD-GRPO_teacher-4B is a 4B agentic reasoning model trained with GRPO (Group Relative Policy Optimization), serving as the teacher model in the SOD distillation framework.

This model is used to distill smaller student models (SOD-0.6B and SOD-1.7B) via the SOD method, which introduces adaptive step-level weighting to handle cascading error propagation in tool-integrated reasoning.

Model Information

AttributeValue
Base ModelQwen3-4B
Training PipelineCold-Start SFT → GRPO
Parameters4B

Related Models

ModelDescription
SOD-0.6BSOD-distilled 0.6B student
SOD-1.7BSOD-distilled 1.7B student
SOD-GRPO_teacher-4BGRPO-trained 4B teacher model (this model)

Performance

We report average@32 over 5 runs on challenging math, science, and code benchmarks.

MethodAIME 2024AIME 2025GPQA-DiamondLiveCodeBench-v6Average
GRPO (This Model)67.6060.4255.1963.1361.59

Distilled Students

ModelAIME 2024AIME 2025GPQA-DiamondLiveCodeBench-v6Average
SOD-0.6B20.8426.1322.1927.7224.22
SOD-1.7B50.8341.7238.7240.6342.98

Acknowledgement

We sincerely thank the authors of DemyAgent-4B and the paper "Demystifying Reinforcement Learning in Agentic Reasoning" (arXiv:2510.11701) for their contribution.

Citation

bibtex

@article{zhong2026sod,
title={SOD: Step-wise On-policy Distillation for Small Language Model Agents},
author={Zhong, Qiyong and Zheng, Mao and Song, Mingyang and Lin, Xin and Sun, Jie and Jiang, Houcheng and Wang, Xiang and Fang, Junfeng},
journal={arXiv preprint arXiv:2605.07725},
year={2026}
}

Model provider

youngzhong

Model tree

Base

Qwen/Qwen3-4B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today