imdatta0

qwen3-4b-swegym-moto-kl02-multitrain-grpo-tfpen10-b04-lr1e6-s12-adapter

README

Qwen3-4B SWE-Gym moto KL02 target-file penalty ablation adapter

This repository contains a PEFT LoRA adapter checkpoint from the Qwen3 SWE-Gym moto investigation.

Negative ablation: target-file missing penalty did not improve held-out greedy over the inherited KL02 adapter.

Local source checkpoint before upload:

/mnt/disks/unslothai/datta0/cache/qwen3-grpo-patch/20260604_235441_swegym_q4b-kl02-multitrain-grpo-tfpen10-b04-lr1e6-s12_2bc44f3/checkpoints/best_holdout

Use with the base model unsloth/Qwen3-4B-Instruct-2507.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

imdatta0

Model Tree

Base

unsloth/Qwen3-4B-Instruct-2507

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Explore FriendliAI today

Get started Talk to an engineer