yinita
ps4mas-grpo-9b-sonnet-large-step200
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Checkpoint note
Best in-run eval was step 175 (reward -0.875, composite ~2.13), but only save_steps=50 checkpoints were kept (keep_checkpoints=1). This repo contains step 200 — the only surviving numbered checkpoint.
| Step | reward | ~composite | saved? |
|---|---|---|---|
| 175 | -0.875 | 2.13 | no (eval peak) |
| 200 | -1.009 | 1.99 | yes (this repo) |
Training config
- Topologies: PS-cold-single, PS-cold-central, PS-cold-hier, PS-cold-debate
- LoRA r=16, alpha=32, target_modules: q/k/v/o_proj, gate/up/down_proj
- group_size=8, groups_per_step=8, temperature=0.8
- Judge: us.anthropic.claude-sonnet-4-6 (Bedrock)
Usage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase = "Qwen/Qwen3.5-9B"model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")model = PeftModel.from_pretrained(model, "yinita/ps4mas-grpo-9b-sonnet-large-step200")tokenizer = AutoTokenizer.from_pretrained("yinita/ps4mas-grpo-9b-sonnet-large-step200")
For vLLM LoRA: load base Qwen3.5-9B + this adapter (see PS4MAS eval scripts).
W&B
Model provider
yinita
Model tree
Base
Qwen/Qwen3.5-9B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information