Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

FieldValue
Base modelQwen/Qwen3.5-9B
Adapter typeLoRA (PEFT)
Precisionbfloat16 (no quantization)
Fine-tuning frameworkUnsloth + TRL SFTTrainer
Training hardwareNVIDIA B200 (Blackwell) via Modal
Training time~41 min

Dataset

SWE-Gym/OpenHands-SFT-Trajectories

Split used: train.success.oss — successful OpenHands agent trajectories on open-source SWE-Bench tasks.

  • Total examples used: 491 (full dataset, MAX_SAMPLES=-1)
  • Format: JSONL with messages / trajectory fields serialized as text

Training Hyperparameters

HyperparameterValue
Epochs1
Learning rate2e-4
LR schedulerCosine with warmup
Warmup ratio0.03
Batch size (per device)32
Gradient accumulation1
Effective batch size32
Max sequence length8192
PackingTrue
Optimizeradamw_torch_fused
LoRA rank (r)16
LoRA alpha16
LoRA dropout0.05
LoRA biasnone
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Gradient checkpointingFalse
torch.compileTrue
Mixed precisionbf16
Random seed3407

Training Metrics (Final Epoch)

MetricValue
Train loss0.3010
Final step loss~0.157
Grad norm (final steps)~0.11–0.15
Train runtime2459 s (~41 min)
Samples/sec0.2
Steps/sec0.05
Total steps123

Loss decreased from ~0.8 (early steps) to ~0.07–0.22 (final steps), with entropy tracking similarly — indicating the model learned lower-entropy, more confident distributions on SWE trajectory data.

Usage

python

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-9B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Shreyansh327/qwen3.5-9b-swegym-lora-full")
model = PeftModel.from_pretrained(base, "Shreyansh327/qwen3.5-9b-swegym-lora-full")
model.eval()

Or with Unsloth:

python

from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained(
"Shreyansh327/qwen3.5-9b-swegym-lora-full",
max_seq_length=8192,
load_in_16bit=True,
)

Intended Use

Agentic software engineering — the model is trained to follow OpenHands-style trajectories: reading files, running bash commands, editing code, and submitting patches to resolve GitHub issues. Pair with an agent scaffold (e.g., OpenHands) for best results.

Limitations

  • Trained for only 1 epoch on 491 trajectories — lightweight fine-tune, not a full RLVR run
  • No held-out evaluation benchmark numbers (SWE-Bench Verified / Lite) yet
  • May overfit to OpenHands action format; other scaffolds may need prompt adaptation

Model provider

Shreyansh327

Shreyansh327

Model tree

Base

Qwen/Qwen3.5-9B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today