LorMolf
SPSD-RL-Qwen3-4B-Factory-MHTrue-2Ep-20260608
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherTraining
- Base model:
Qwen/Qwen3-4B-Base - Dataset:
LorMolf/SPSD-RL,data/*.jsonl - Dataset revision:
76e62ee11f0b6b8e9a5511a7044a556f7c0c8e42 - Pipeline:
src/training_eval/train_sft_factory.py - Template:
qwen - Supervision: prompt/completion assistant-turn expansion
- LLaMA-Factory masking:
train_on_prompt=false,mask_history=true - Sequence length: 16384
- Epochs: 2
- Per-device train batch size: 1
- GPUs: 4
- Gradient accumulation steps: 16
- Effective train batch size: 64
- Learning rate: 2e-5
- Warmup ratio: 0.03
- Scheduler: linear
- Precision: bf16
- Packing: true,
neat_packing=true
Final training metrics from the local run:
train_loss: 0.06013819321350911train_runtime: 22:30:13.22train_steps_per_second: 0.011- Final epoch: 2.0
W&B run: https://wandb.ai/lorenzo-molfetta/olmo-spiral-sft/runs/qr44c5qn
Notes
This run was launched before train-time validation was added to the factory pipeline, so it has no validation metrics. Use the repository generation evaluation pipeline for downstream SPSD-RL benchmark results.
Model provider
LorMolf
Model tree
Base
Qwen/Qwen3-4B-Base
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information