rwlinno/topoprm-ckpts API & Inference Endpoint

Available Checkpoints

Table with columns: Subdirectory, Base Model, Method, Steps, Key Results
Subdirectory	Base Model	Method	Steps	Key Results
`sft-dr1-7b-final`	DeepSeek-R1-Distill-Qwen-7B	SFT	3651	GSM8K 83.5% baseline
`grpo-topoprm-dr1-7b`	DeepSeek-R1-Distill-Qwen-7B	GRPO+TopoPRM	100	Hierarchical reward
`grpo-topoprm-qwen35-9b`	Qwen3.5-9B	GRPO+TopoPRM	50	GSM8K 93.5%, MATH500 49.8%
`opd-topoprm-dr1-7b-v2`	DeepSeek-R1-Distill-Qwen-7B	OPD Stage3	200	MATH500 60.8%, Omni-MATH 56.9%
`opd-topoprm-qwen35-9b-v2`	Qwen3.5-9B	OPD Stage3	50	Distillation
`grpo-scae-qwen35-9b`	Qwen3.5-9B	GRPO+SCAE	949	SCAE variant

Common LoRA Config

All adapters share:

r: 64
alpha: 128
dropout: 0.05
target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
task_type: CAUSAL_LM

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
model = PeftModel.from_pretrained(base_model, "rwlinno/topoprm-ckpts", subfolder="grpo-topoprm-dr1-7b")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")

Citation

bibtex
@inproceedings{topoprm2026,
  title={Topology-Aware Process Rewards for Verifiable Mathematical Reasoning},
  author={Weilin Ruan},
  booktitle={Proceedings of EMNLP 2026},
  year={2026}
}

Available Checkpoints

Table with columns: Subdirectory, Base Model, Method, Steps, Key Results
Subdirectory	Base Model	Method	Steps	Key Results
`sft-dr1-7b-final`	DeepSeek-R1-Distill-Qwen-7B	SFT	3651	GSM8K 83.5% baseline
`grpo-topoprm-dr1-7b`	DeepSeek-R1-Distill-Qwen-7B	GRPO+TopoPRM	100	Hierarchical reward
`grpo-topoprm-qwen35-9b`	Qwen3.5-9B	GRPO+TopoPRM	50	GSM8K 93.5%, MATH500 49.8%
`opd-topoprm-dr1-7b-v2`	DeepSeek-R1-Distill-Qwen-7B	OPD Stage3	200	MATH500 60.8%, Omni-MATH 56.9%
`opd-topoprm-qwen35-9b-v2`	Qwen3.5-9B	OPD Stage3	50	Distillation
`grpo-scae-qwen35-9b`	Qwen3.5-9B	GRPO+SCAE	949	SCAE variant

Common LoRA Config

All adapters share:

r: 64
alpha: 128
dropout: 0.05
target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
task_type: CAUSAL_LM

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
model = PeftModel.from_pretrained(base_model, "rwlinno/topoprm-ckpts", subfolder="grpo-topoprm-dr1-7b")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")

Citation

bibtex
@inproceedings{topoprm2026,
  title={Topology-Aware Process Rewards for Verifiable Mathematical Reasoning},
  author={Weilin Ruan},
  booktitle={Proceedings of EMNLP 2026},
  year={2026}
}

topoprm-ckpts

Get help setting up a custom Dedicated Endpoints.

README

Available Checkpoints

Common LoRA Config

Usage

Citation

Explore FriendliAI today

README

Available Checkpoints

Common LoRA Config

Usage

Citation