banyaaiofficial

Qwen3.5-122B-A10B-Banya-Tuned-v21-grpo

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Qwen3.5-122B-A10B-Banya-Tuned-v21-grpo

Option D3 + v10 (Masked SFT) init + dense reward — GRPO with multi-stage preflight reward.

  • init: v10 LoRA (Masked SFT, assistant-only loss)
  • trainer: TRL GRPOTrainer
  • rollout: HF model.generate (k=8 per task, T=1.0)
  • reward: dense [0,1.0] = parse 0.05 + grep 0.05 + file 0.10 + func 0.10 + harness 0.30/0.70
  • MoE safeguards: output_router_logits + aux loss + explicit router freeze (from v19)
  • corpus: SWE-bench-Lite 270 train pool (no leakage with stratified-30 eval)
  • hyperparams: β=0.1, ε=0.2, lr=1e-6, 80 steps, k=8
  • train stats: REAL PASS 24/80 = 30%, train_loss 0.0855, train_runtime 18h 16m

Model provider

banyaaiofficial

Model tree

Base

Qwen/Qwen3.5-122B-A10B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today