yinita

yinita

ps4mas-grpo-9b-sonnet-large-step200

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Checkpoint note

Best in-run eval was step 175 (reward -0.875, composite ~2.13), but only save_steps=50 checkpoints were kept (keep_checkpoints=1). This repo contains step 200 — the only surviving numbered checkpoint.

Table
Stepreward~compositesaved?
175-0.8752.13no (eval peak)
200-1.0091.99yes (this repo)

Training config

  • Topologies: PS-cold-single, PS-cold-central, PS-cold-hier, PS-cold-debate
  • LoRA r=16, alpha=32, target_modules: q/k/v/o_proj, gate/up/down_proj
  • group_size=8, groups_per_step=8, temperature=0.8
  • Judge: us.anthropic.claude-sonnet-4-6 (Bedrock)

Usage

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "Qwen/Qwen3.5-9B"
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "yinita/ps4mas-grpo-9b-sonnet-large-step200")
tokenizer = AutoTokenizer.from_pretrained("yinita/ps4mas-grpo-9b-sonnet-large-step200")

For vLLM LoRA: load base Qwen3.5-9B + this adapter (see PS4MAS eval scripts).

W&B

https://wandb.ai/yinita/ps4mas-drmas/runs/5915a52fba046f09

Model provider

yinita

yinita

Model tree

Base

Qwen/Qwen3.5-9B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today