Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3.5-9B |
| Adapter type | LoRA (PEFT) |
| Precision | bfloat16 (no quantization) |
| Fine-tuning framework | Unsloth + TRL SFTTrainer |
| Training hardware | NVIDIA B200 (Blackwell) via Modal |
| Training time | ~41 min |
Dataset
SWE-Gym/OpenHands-SFT-Trajectories
Split used: train.success.oss — successful OpenHands agent trajectories on open-source SWE-Bench tasks.
- Total examples used: 491 (full dataset,
MAX_SAMPLES=-1) - Format: JSONL with
messages/trajectoryfields serialized as text
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Epochs | 1 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine with warmup |
| Warmup ratio | 0.03 |
| Batch size (per device) | 32 |
| Gradient accumulation | 1 |
| Effective batch size | 32 |
| Max sequence length | 8192 |
| Packing | True |
| Optimizer | adamw_torch_fused |
| LoRA rank (r) | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| LoRA bias | none |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Gradient checkpointing | False |
| torch.compile | True |
| Mixed precision | bf16 |
| Random seed | 3407 |
Training Metrics (Final Epoch)
| Metric | Value |
|---|---|
| Train loss | 0.3010 |
| Final step loss | ~0.157 |
| Grad norm (final steps) | ~0.11–0.15 |
| Train runtime | 2459 s (~41 min) |
| Samples/sec | 0.2 |
| Steps/sec | 0.05 |
| Total steps | 123 |
Loss decreased from ~0.8 (early steps) to ~0.07–0.22 (final steps), with entropy tracking similarly — indicating the model learned lower-entropy, more confident distributions on SWE trajectory data.
Usage
python
from peft import PeftModelfrom transformers import AutoTokenizer, AutoModelForCausalLMimport torchbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B",torch_dtype=torch.bfloat16,device_map="auto",)tokenizer = AutoTokenizer.from_pretrained("Shreyansh327/qwen3.5-9b-swegym-lora-full")model = PeftModel.from_pretrained(base, "Shreyansh327/qwen3.5-9b-swegym-lora-full")model.eval()
Or with Unsloth:
python
from unsloth import FastVisionModelmodel, tokenizer = FastVisionModel.from_pretrained("Shreyansh327/qwen3.5-9b-swegym-lora-full",max_seq_length=8192,load_in_16bit=True,)
Intended Use
Agentic software engineering — the model is trained to follow OpenHands-style trajectories: reading files, running bash commands, editing code, and submitting patches to resolve GitHub issues. Pair with an agent scaffold (e.g., OpenHands) for best results.
Limitations
- Trained for only 1 epoch on 491 trajectories — lightweight fine-tune, not a full RLVR run
- No held-out evaluation benchmark numbers (SWE-Bench Verified / Lite) yet
- May overfit to OpenHands action format; other scaffolds may need prompt adaptation
Model provider
Shreyansh327
Model tree
Base
Qwen/Qwen3.5-9B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information