Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Run Status

  • Status: complete
  • Adapter present: True
  • Latest checkpoint: outputs/qwen-capability-light/stage2-capability-step-sft/checkpoint-440
  • Best checkpoint: outputs/qwen-capability-light/stage2-capability-step-sft/checkpoint-440
  • Best eval loss: 2.1149797439575195
  • Trainer state: outputs/qwen-capability-light/stage2-capability-step-sft/trainer_state.json
  • Global step: 440
  • First Loss: 1.3416800498962402
  • Final Loss: 1.4746193885803223
  • Min Loss: 0.4818551540374756
  • Max Loss: 1.8697149753570557
  • Loss Points: 440
  • First Eval Loss: 2.545384168624878
  • Final Eval Loss: 2.1149797439575195
  • Min Eval Loss: 2.1149797439575195
  • Max Eval Loss: 2.545384168624878
  • Eval Loss Points: 23
  • Best Eval Loss: 2.1149797439575195
  • Best Global Step: 440
  • Train Runtime S: 8950.7498

Generated files:

  • training_config.json
  • stage_report.json
  • loss_history.csv
  • loss_curve.svg
  • eval_loss_history.csv
  • eval_loss_curve.svg

Loss curve

Eval loss curve

Context

  • Purpose: Capability next-action SFT over the full 7k token-safe view with richer stepwise reasoning.
  • Previous adapter: armand0e/qwen3.5-capability-light-v2-behavior-seed-lora
  • Next stage: stage3-capability-dpo
  • Base model: Qwen/Qwen3.5-2B
  • Data file: data/assembled/sft_qwen_next_actions_capability_light.jsonl
  • Eval file: data/eval/eval_next_actions_with_retention.jsonl
  • LoRA r/alpha/dropout: 16 / 16 / 0.0
  • Learning rate: 2e-06
  • Epochs: 1.0
  • Merged 16-bit model: armand0e/qwen3.5-capability-light-v2-capability-step-merged-16bit

Upstream Data

  • armand0e/qwen3.7-max-pi-traces
  • armand0e/badlogicgames-pi-mono-opus-filtered
  • armand0e/gpt-5.5-agent
  • armand0e/gpt-5.5-chat
  • TeichAI/claude-4.5-opus-high-reasoning-250x
  • TeichAI/Claude-Opus-4.6-Reasoning-887x

Compact Local Sample

json

{
"messages": [
{
"content": "User/task context:\nuser: Give only the answer: (864 + 256) - 28 = ?",
"role": "user"
},
{
"content": "1092",
"reasoning_content": "This hinges on: Expression evaluates to 1092. Respond by choosing to return only the numeric answer. Avoid drifting past: no extra words.",
"role": "assistant"
}
],
"metadata": {
"expected": 1092,
"family": "arithmetic",
"key": "gap/arithmetic/00021",
"source": "gap_capability_pack"
},
"source": "gap_capability_pack"
}

Reproduction

The exact stage command and package versions are in training_config.json.

Model provider

armand0e

Model tree

Base

Qwen/Qwen3.5-2B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today