Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitAvailable Checkpoints
| Subdirectory | Base Model | Method | Steps | Key Results |
|---|---|---|---|---|
sft-dr1-7b-final | DeepSeek-R1-Distill-Qwen-7B | SFT | 3651 | GSM8K 83.5% baseline |
grpo-topoprm-dr1-7b | DeepSeek-R1-Distill-Qwen-7B | GRPO+TopoPRM | 100 | Hierarchical reward |
grpo-topoprm-qwen35-9b | Qwen3.5-9B | GRPO+TopoPRM | 50 | GSM8K 93.5%, MATH500 49.8% |
opd-topoprm-dr1-7b-v2 | DeepSeek-R1-Distill-Qwen-7B | OPD Stage3 | 200 | MATH500 60.8%, Omni-MATH 56.9% |
opd-topoprm-qwen35-9b-v2 | Qwen3.5-9B | OPD Stage3 | 50 | Distillation |
grpo-scae-qwen35-9b | Qwen3.5-9B | GRPO+SCAE | 949 | SCAE variant |
Common LoRA Config
All adapters share:
- r: 64
- alpha: 128
- dropout: 0.05
- target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- task_type: CAUSAL_LM
Usage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")model = PeftModel.from_pretrained(base_model, "rwlinno/topoprm-ckpts", subfolder="grpo-topoprm-dr1-7b")tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
Citation
bibtex
@inproceedings{topoprm2026,title={Topology-Aware Process Rewards for Verifiable Mathematical Reasoning},author={Weilin Ruan},booktitle={Proceedings of EMNLP 2026},year={2026}}
Model provider
rwlinno
Model tree
Base
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information