AlexWortega
qwen35-4b-clawd-rift
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Usage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torchbase = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3.5-4B', dtype=torch.bfloat16, device_map='cuda')model = PeftModel.from_pretrained(base, 'AlexWortega/qwen35-4b-clawd-rift')tok = AutoTokenizer.from_pretrained('AlexWortega/qwen35-4b-clawd-rift')
Or with sglang:
bash
python -m sglang.launch_server --model-path Qwen/Qwen3.5-4B \--lora-paths clawd-rift=AlexWortega/qwen35-4b-clawd-rift \--tool-call-parser hermes
Evaluation results
tbench-2 (89 docker tasks via Pi-style runner)
7/89 (7.9%). Tasks unique to clawd-rift: fix-ocaml-gc, pytorch-model-recovery.
| Variant in pipeline | Pass on tbench-2 |
|---|---|
| ckpt600 (Soyuz SFT only) | 7 |
| clawd-100 (+ ClawGym 100 steps) | 7 |
| clawd-200 (+ ClawGym 200 steps) | 7 |
| clawd-rft (positive-only SFT on rollouts) | 6 |
| clawd-rift (true RIFT on rollouts) — this model | 7 |
ClawGym-Bench (200 tasks via openclaw scaffold)
| Stat | Value |
|---|---|
| mean | 0.371 |
| half+ (≥0.5) | 80/200 (40%) |
| perfect (=1.0) | 2 (tasks 78, 148) |
| zero | 40 |
Comparison to RUC-AIBOX ClawGym leaderboard (compact open-weight models):
| Model | ClawGym avg |
|---|---|
| Qwen3-32B | 33.11 |
| Qwen3-8B | 35.02 |
| clawd-rift (this, 4B, QLoRA, 1 GPU) | 37.10 |
| Qwen3-30A3B (MoE) | 45.11 |
| ClawGym-4B (RUC-AIBOX full SFT) | 47.73 |
Optimal inference parameters
Sampling sweet-spot is scaffold-dependent.
| Scaffold | Task type | Optimal sampling |
|---|---|---|
| openclaw (ClawGym-style formal spec) | JSON/Markdown to schema | T=0.3-0.5, top_p=0.95, no min_p |
| pi-agent (terminus_runner shell explore) | trial-and-error commands | T=0.7-0.8, top_p=0.95, min_p=0.05 |
Universal default that loses only ~5% on each:
markdown
temperature=0.5, top_p=0.95, top_k=40, repetition_penalty=1.05
Training methodology — pipeline of 3 stages
Stage 1: Soyuz SFT (ckpt600 — base agent format)
QLoRA r=64 alpha=128 on Qwen/Qwen3.5-4B.
- Datasets:
AlexWortega/Soyuz-sft+AlexWortega/AgentTrove - Format: Hermes-style JSON tool calls (
<tool_call>{"name":...,"arguments":...}</tool_call>) - 600 steps total, seq=8K, Muon optimizer for LoRA matrices
- Output:
ckpt-400,ckpt-600(intermediate);soup_sum = ckpt400 + ckpt600(arithmetic merge)
Stage 2: ClawGym continue-train (clawd-100, clawd-200 — openclaw scaffold adaptation)
Continue-train ckpt600 on filtered RUC-AIBOX/ClawGym-Trajectory.
- 1937 trajectories (filtered ≤16K tokens out of 24.5K)
- 200 steps, seq=16K, LR=1e-4, AdamW
- Hermes chat template + openclaw native tools (read/write/exec/web_search/...)
- Output:
clawd-100(mid),clawd-200(final)
Stage 3: RIFT — own rollouts + reward feedback
True RIFT loss on top of clawd-200:
python
# positive (reward > 0): NLL × reward — weighted SFT# negative (reward = 0): exp(logp) × negative_scale — unlikelihood
- 61 trajectories from soup_sum's own ClawGym rollouts (46 pos + 15 neg, reward 0-1)
- 5 epochs / 80 steps, LR=2e-5
- Implementation:
compare_offlinegpro/src/trainers/offline_losses.py - Output:
clawd-rift← this model
Repos
| Asset | Link | Size |
|---|---|---|
| LoRA adapter | qwen35-4b-clawd-rift | 340 MB |
| Merged bf16 | qwen35-4b-clawd-rift-merged | 8.4 GB |
| GGUF (4 quants) | qwen35-4b-clawd-rift-gguf | 18.6 GB |
| Raw evals | qwen35-4b-clawd-rift-evals | <1 MB |
GGUF breakdown:
clawd-rift-f16.gguf(7.9 GB, baseline)clawd-rift-Q8_0.gguf(4.2 GB, near-lossless)clawd-rift-Q5_K_M.gguf(2.9 GB, recommended)clawd-rift-Q4_K_M.gguf(2.6 GB, smallest)
W&B training logs: https://wandb.ai/alexwortega/vae-llm-agents
Related: Stage-1-only model (qwen35-4b-soyuz)
A cleaner, stronger reference for the Stage-1 base (Soyuz SFT only — no ClawGym, no RIFT) is now available, trained as full bf16 LoRA r=128 (vs QLoRA r=64 here):
| Asset | Link |
|---|---|
| LoRA r=128 bf16 | AlexWortega/qwen35-4b-soyuz |
| Merged bf16 | AlexWortega/qwen35-4b-soyuz-merged |
Final eval on Soyuz-clean held-out: loss=0.247, token_acc=0.936. Trained on the cleaned 11-stream subset of AlexWortega/Soyuz-sft at seq=16K, 1 epoch.
Useful if you want only the Hermes-tool-call SFT without the ClawGym/RIFT specialization.
Model provider
AlexWortega
Model tree
Base
Qwen/Qwen3.5-4B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information