ubicloud

SWE-Eff-Hard-14B

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Model Summary

SWE-Eff†-14B is a LoRA fine-tuned SWE agent model based on Qwen3-14B, trained on ~3K high-quality filtered trajectories from R2EGym with a 32K context window. It uses suggestive thinking with masked supervision (mask_think) to inject reasoning prompts into training while masking them from loss, preserving the model's autonomous reasoning while implicitly guiding efficient agent behaviors.

SWE-Eff† serves as the conservative complementary model — optimized for harder problems involving multi-file logic, unclear root causes, and complex API interactions. For structured tasks with clear error traces, see the default model SWE-Eff.

Suggestive Thinking

Unlike the default SWE-Eff model, SWE-Eff† injects reasoning prompts (wrapped in <think...</think blocks) into each assistant message during training to encourage deeper reasoning:

  • Let's think step by step ... (encourages deeper reasoning before actions)
  • Let's view, think, edit, test ... (standardizes workflows)
  • If I get stuck in a loop, I need to think of different solutions to break out of it. (mitigates repetitive action loops)

The mask_think mechanism excludes the content inside thinking blocks from loss computation while retaining supervision on the <think and </think tokens. This allows suggestive guidance to influence the model implicitly while preserving autonomous reasoning behavior.

Training Data

Fine-tuned on filtered-R2EGym-SFT-Trajectories — 3,218 high-quality trajectories filtered from R2EGym-SFT via a multi-stage pipeline:

  1. Basic Quality: exit_status = Submitted & resolved = True
  2. Behavioral Soundness: Redundant loop detection & excessive search ratio filtering
  3. Hallucination Control: Shortcut pattern & false reasoning detection
  4. Thought–Action Alignment: Intent vs. action consistency enforcement

Training Configuration

Table
ItemValue
Base ModelQwen3-14B
Precisionbfloat16
PEFT MethodLoRA
LoRA Rank (r)16
LoRA Alpha32
LoRA Dropout0.2
Target Modulesq/k/v/o/up/down/gate_proj
Adapter Size246 MB
Global Batch Size16
Gradient Accumulation8
Learning Rate2e-4
LR SchedulerCosine
Warmup Ratio0.05
Weight Decay0.1
Training Epochs3
Total Training Time~10.5 h
Hardware2 × H200
Maximum Context Length32,768 tokens
Key ModificationSuggestive thinking + mask_think

Evaluation

Evaluated on SWE-bench Verified using R2E-Gym scaffold with 32K context, 100-turn limit, temperature=0.6, top_p=0.95, and function calling disabled.

Table
MetricSWE-Eff (Default)SWE-Eff† (Complementary)SWE-Eff‡ (Union)
Resolved rate21.6%20.6%30.4%
Avg steps37.144.5 (+20%)
Submission success rate43.2%55.3%
Edit success rate54.2%63.2%
>80-step resolve rate2.0% (1/51)13.7% (7/51)

Usage

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-14B", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-14B")
model = PeftModel.from_pretrained(base_model, "ubicloud/SWE-Eff-Hard-14B")

When to Use

  • SWE-Eff† (this model): Multi-file logic, unclear root causes, complex API interactions, known hard projects (e.g., sympy, sphinx, psf)
  • SWE-Eff: Bugs with clear error traces, localized to a single file, structured repositories (e.g., django, scikit-learn, xarray)

Model provider

ubicloud

Model tree

Base

Qwen/Qwen3-14B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today