Shreyansh327

qwen3.5-9b-swegym-lora-full

README

License: apache-2.0

Model Details

Table with columns: Field, Value
Field	Value
Base model	`Qwen/Qwen3.5-9B`
Adapter type	LoRA (PEFT)
Precision	bfloat16 (no quantization)
Fine-tuning framework	Unsloth + TRL SFTTrainer
Training hardware	NVIDIA B200 (Blackwell) via Modal
Training time	~41 min

Dataset

SWE-Gym/OpenHands-SFT-Trajectories

Split used: train.success.oss — successful OpenHands agent trajectories on open-source SWE-Bench tasks.

Total examples used: 491 (full dataset, MAX_SAMPLES=-1)
Format: JSONL with messages / trajectory fields serialized as text

Training Hyperparameters

Table with columns: Hyperparameter, Value
Hyperparameter	Value
Epochs	1
Learning rate	2e-4
LR scheduler	Cosine with warmup
Warmup ratio	0.03
Batch size (per device)	32
Gradient accumulation	1
Effective batch size	32
Max sequence length	8192
Packing	True

Training Metrics (Final Epoch)

Table with columns: Metric, Value
Metric	Value
Train loss	0.3010
Final step loss	~0.157
Grad norm (final steps)	~0.11–0.15
Train runtime	2459 s (~41 min)
Samples/sec	0.2
Steps/sec	0.05
Total steps	123

Loss decreased from ~0.8 (early steps) to ~0.07–0.22 (final steps), with entropy tracking similarly — indicating the model learned lower-entropy, more confident distributions on SWE trajectory data.

Usage

python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Shreyansh327/qwen3.5-9b-swegym-lora-full")
model = PeftModel.from_pretrained(base, "Shreyansh327/qwen3.5-9b-swegym-lora-full")
model.eval()

Or with Unsloth:

python
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Shreyansh327/qwen3.5-9b-swegym-lora-full",
    max_seq_length=8192,
    load_in_16bit=True,
)

Intended Use

Agentic software engineering — the model is trained to follow OpenHands-style trajectories: reading files, running bash commands, editing code, and submitting patches to resolve GitHub issues. Pair with an agent scaffold (e.g., OpenHands) for best results.

Limitations

Trained for only 1 epoch on 491 trajectories — lightweight fine-tune, not a full RLVR run
No held-out evaluation benchmark numbers (SWE-Bench Verified / Lite) yet
May overfit to OpenHands action format; other scaffolds may need prompt adaptation

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

Shreyansh327

Model Tree

Base

Qwen/Qwen3.5-9B

Adapter

this model

Input Modalities

Text

Image

Video

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Model Details

Table with columns: Field, Value
Field	Value
Base model	`Qwen/Qwen3.5-9B`
Adapter type	LoRA (PEFT)
Precision	bfloat16 (no quantization)
Fine-tuning framework	Unsloth + TRL SFTTrainer
Training hardware	NVIDIA B200 (Blackwell) via Modal
Training time	~41 min

Dataset

SWE-Gym/OpenHands-SFT-Trajectories

Split used: train.success.oss — successful OpenHands agent trajectories on open-source SWE-Bench tasks.

Total examples used: 491 (full dataset, MAX_SAMPLES=-1)
Format: JSONL with messages / trajectory fields serialized as text

Training Hyperparameters

Table with columns: Hyperparameter, Value
Hyperparameter	Value
Epochs	1
Learning rate	2e-4
LR scheduler	Cosine with warmup
Warmup ratio	0.03
Batch size (per device)	32
Gradient accumulation	1
Effective batch size	32
Max sequence length	8192
Packing	True

Training Metrics (Final Epoch)

Table with columns: Metric, Value
Metric	Value
Train loss	0.3010
Final step loss	~0.157
Grad norm (final steps)	~0.11–0.15
Train runtime	2459 s (~41 min)
Samples/sec	0.2
Steps/sec	0.05
Total steps	123

Usage

python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Shreyansh327/qwen3.5-9b-swegym-lora-full")
model = PeftModel.from_pretrained(base, "Shreyansh327/qwen3.5-9b-swegym-lora-full")
model.eval()

Or with Unsloth:

python
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Shreyansh327/qwen3.5-9b-swegym-lora-full",
    max_seq_length=8192,
    load_in_16bit=True,
)

Intended Use

Limitations

Trained for only 1 epoch on 491 trajectories — lightweight fine-tune, not a full RLVR run
No held-out evaluation benchmark numbers (SWE-Bench Verified / Lite) yet
May overfit to OpenHands action format; other scaffolds may need prompt adaptation