Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Run Status
- Status:
complete_skipped - Adapter present:
True - Latest checkpoint:
outputs/qwen-capability-light/stage2-capability-step-sft/checkpoint-260 - Best checkpoint:
outputs/qwen-capability-light/stage2-capability-step-sft/checkpoint-260 - Best eval loss:
2.5278091430664062 - Trainer state:
outputs/qwen-capability-light/stage2-capability-step-sft/trainer_state.json - Global step:
260 - First Loss:
1.5216938257217407 - Final Loss:
1.655211091041565 - Min Loss:
0.6422638297080994 - Max Loss:
2.0936222076416016 - Loss Points:
260 - First Eval Loss:
2.6287848949432373 - Final Eval Loss:
2.5278091430664062 - Min Eval Loss:
2.5278091430664062 - Max Eval Loss:
2.6287848949432373 - Eval Loss Points:
14 - Best Eval Loss:
2.5278091430664062 - Best Global Step:
260 - Train Runtime S:
3794.8465
Generated files:
training_config.jsonstage_report.jsonloss_history.csvloss_curve.svgeval_loss_history.csveval_loss_curve.svg
Context
- Purpose: Capability next-action SFT on coding-agent decisions plus verifier-style gap repairs.
- Previous adapter:
armand0e/qwen3.5-capability-light-behavior-seed-lora - Next stage:
stage3-capability-dpo - Base model:
Qwen/Qwen3.5-2B - Data file:
data/assembled/sft_qwen_next_actions_capability_light.jsonl - Eval file:
data/eval/eval_next_actions_with_retention.jsonl - LoRA r/alpha/dropout:
16/16/0.0 - Learning rate:
8e-07 - Epochs:
1.0 - Merged 16-bit model:
armand0e/qwen3.5-capability-light-capability-step-merged-16bit
Upstream Data
armand0e/qwen3.7-max-pi-tracesarmand0e/badlogicgames-pi-mono-opus-filteredarmand0e/gpt-5.5-agentarmand0e/gpt-5.5-chatTeichAI/claude-4.5-opus-high-reasoning-250xTeichAI/Claude-Opus-4.6-Reasoning-887x
Compact Local Sample
json
{"messages": [{"content": "User/task context:\nuser: Give only the answer: (98 + 459) - 34 = ?","role": "user"},{"content": "523","reasoning_content": "Expression evaluates to 523. The answer should return only the numeric answer, while making sure no extra words.","role": "assistant"}],"metadata": {"expected": 523,"family": "arithmetic","key": "gap/arithmetic/00000","source": "gap_capability_pack"},"source": "gap_capability_pack"}
Reproduction
The exact stage command and package versions are in training_config.json.
Model provider
armand0e
Model tree
Base
Qwen/Qwen3.5-2B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information