Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Run Status
- Status:
complete - Adapter present:
True - Latest checkpoint:
outputs/qwen-capability-light/stage2-capability-step-sft/checkpoint-440 - Best checkpoint:
outputs/qwen-capability-light/stage2-capability-step-sft/checkpoint-440 - Best eval loss:
2.1149797439575195 - Trainer state:
outputs/qwen-capability-light/stage2-capability-step-sft/trainer_state.json - Global step:
440 - First Loss:
1.3416800498962402 - Final Loss:
1.4746193885803223 - Min Loss:
0.4818551540374756 - Max Loss:
1.8697149753570557 - Loss Points:
440 - First Eval Loss:
2.545384168624878 - Final Eval Loss:
2.1149797439575195 - Min Eval Loss:
2.1149797439575195 - Max Eval Loss:
2.545384168624878 - Eval Loss Points:
23 - Best Eval Loss:
2.1149797439575195 - Best Global Step:
440 - Train Runtime S:
8950.7498
Generated files:
training_config.jsonstage_report.jsonloss_history.csvloss_curve.svgeval_loss_history.csveval_loss_curve.svg
Context
- Purpose: Capability next-action SFT over the full 7k token-safe view with richer stepwise reasoning.
- Previous adapter:
armand0e/qwen3.5-capability-light-v2-behavior-seed-lora - Next stage:
stage3-capability-dpo - Base model:
Qwen/Qwen3.5-2B - Data file:
data/assembled/sft_qwen_next_actions_capability_light.jsonl - Eval file:
data/eval/eval_next_actions_with_retention.jsonl - LoRA r/alpha/dropout:
16/16/0.0 - Learning rate:
2e-06 - Epochs:
1.0 - Merged 16-bit model:
armand0e/qwen3.5-capability-light-v2-capability-step-merged-16bit
Upstream Data
armand0e/qwen3.7-max-pi-tracesarmand0e/badlogicgames-pi-mono-opus-filteredarmand0e/gpt-5.5-agentarmand0e/gpt-5.5-chatTeichAI/claude-4.5-opus-high-reasoning-250xTeichAI/Claude-Opus-4.6-Reasoning-887x
Compact Local Sample
json
{"messages": [{"content": "User/task context:\nuser: Give only the answer: (864 + 256) - 28 = ?","role": "user"},{"content": "1092","reasoning_content": "This hinges on: Expression evaluates to 1092. Respond by choosing to return only the numeric answer. Avoid drifting past: no extra words.","role": "assistant"}],"metadata": {"expected": 1092,"family": "arithmetic","key": "gap/arithmetic/00021","source": "gap_capability_pack"},"source": "gap_capability_pack"}
Reproduction
The exact stage command and package versions are in training_config.json.
Model provider
armand0e
Model tree
Base
Qwen/Qwen3.5-2B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information