ubicloud
SWE-Eff-Hard-14B
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model Summary
SWE-Eff†-14B is a LoRA fine-tuned SWE agent model based on Qwen3-14B, trained on ~3K high-quality filtered trajectories from R2EGym with a 32K context window. It uses suggestive thinking with masked supervision (mask_think) to inject reasoning prompts into training while masking them from loss, preserving the model's autonomous reasoning while implicitly guiding efficient agent behaviors.
SWE-Eff† serves as the conservative complementary model — optimized for harder problems involving multi-file logic, unclear root causes, and complex API interactions. For structured tasks with clear error traces, see the default model SWE-Eff.
Suggestive Thinking
Unlike the default SWE-Eff model, SWE-Eff† injects reasoning prompts (wrapped in <think...</think blocks) into each assistant message during training to encourage deeper reasoning:
Let's think step by step ...(encourages deeper reasoning before actions)Let's view, think, edit, test ...(standardizes workflows)If I get stuck in a loop, I need to think of different solutions to break out of it.(mitigates repetitive action loops)
The mask_think mechanism excludes the content inside thinking blocks from loss computation while retaining supervision on the <think and </think tokens. This allows suggestive guidance to influence the model implicitly while preserving autonomous reasoning behavior.
Training Data
Fine-tuned on filtered-R2EGym-SFT-Trajectories — 3,218 high-quality trajectories filtered from R2EGym-SFT via a multi-stage pipeline:
- Basic Quality:
exit_status = Submitted&resolved = True - Behavioral Soundness: Redundant loop detection & excessive search ratio filtering
- Hallucination Control: Shortcut pattern & false reasoning detection
- Thought–Action Alignment: Intent vs. action consistency enforcement
Training Configuration
| Item | Value |
|---|---|
| Base Model | Qwen3-14B |
| Precision | bfloat16 |
| PEFT Method | LoRA |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.2 |
| Target Modules | q/k/v/o/up/down/gate_proj |
| Adapter Size | 246 MB |
| Global Batch Size | 16 |
| Gradient Accumulation | 8 |
| Learning Rate | 2e-4 |
| LR Scheduler | Cosine |
| Warmup Ratio | 0.05 |
| Weight Decay | 0.1 |
| Training Epochs | 3 |
| Total Training Time | ~10.5 h |
| Hardware | 2 × H200 |
| Maximum Context Length | 32,768 tokens |
| Key Modification | Suggestive thinking + mask_think |
Evaluation
Evaluated on SWE-bench Verified using R2E-Gym scaffold with 32K context, 100-turn limit, temperature=0.6, top_p=0.95, and function calling disabled.
| Metric | SWE-Eff (Default) | SWE-Eff† (Complementary) | SWE-Eff‡ (Union) |
|---|---|---|---|
| Resolved rate | 21.6% | 20.6% | 30.4% |
| Avg steps | 37.1 | 44.5 (+20%) | — |
| Submission success rate | 43.2% | 55.3% | — |
| Edit success rate | 54.2% | 63.2% | — |
| >80-step resolve rate | 2.0% (1/51) | 13.7% (7/51) | — |
Usage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-14B", torch_dtype="auto", device_map="auto")tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-14B")model = PeftModel.from_pretrained(base_model, "ubicloud/SWE-Eff-Hard-14B")
When to Use
- SWE-Eff† (this model): Multi-file logic, unclear root causes, complex API interactions, known hard projects (e.g.,
sympy,sphinx,psf) - SWE-Eff: Bugs with clear error traces, localized to a single file, structured repositories (e.g.,
django,scikit-learn,xarray)
Model provider
ubicloud
Model tree
Base
Qwen/Qwen3-14B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information