AlexWortega

qwen35-4b-clawd-rift

README

License: apache-2.0

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3.5-4B', dtype=torch.bfloat16, device_map='cuda')
model = PeftModel.from_pretrained(base, 'AlexWortega/qwen35-4b-clawd-rift')
tok = AutoTokenizer.from_pretrained('AlexWortega/qwen35-4b-clawd-rift')

Or with sglang:

bash
python -m sglang.launch_server --model-path Qwen/Qwen3.5-4B \
    --lora-paths clawd-rift=AlexWortega/qwen35-4b-clawd-rift \
    --tool-call-parser hermes

Evaluation results

tbench-2 (89 docker tasks via Pi-style runner)

7/89 (7.9%). Tasks unique to clawd-rift: fix-ocaml-gc, pytorch-model-recovery.

Table with columns: Variant in pipeline, Pass on tbench-2
Variant in pipeline	Pass on tbench-2
ckpt600 (Soyuz SFT only)	7
clawd-100 (+ ClawGym 100 steps)	7
clawd-200 (+ ClawGym 200 steps)	7
clawd-rft (positive-only SFT on rollouts)	6
clawd-rift (true RIFT on rollouts) — this model	7

ClawGym-Bench (200 tasks via openclaw scaffold)

Table with columns: Stat, Value
Stat	Value
mean	0.371
half+ (≥0.5)	80/200 (40%)
perfect (=1.0)	2 (tasks 78, 148)
zero	40

Comparison to RUC-AIBOX ClawGym leaderboard (compact open-weight models):

Table with columns: Model, ClawGym avg
Model	ClawGym avg
Qwen3-32B	33.11
Qwen3-8B	35.02
clawd-rift (this, 4B, QLoRA, 1 GPU)	37.10
Qwen3-30A3B (MoE)	45.11
ClawGym-4B (RUC-AIBOX full SFT)	47.73

Optimal inference parameters

Sampling sweet-spot is scaffold-dependent.

Table with columns: Scaffold, Task type, Optimal sampling
Scaffold	Task type	Optimal sampling
openclaw (ClawGym-style formal spec)	JSON/Markdown to schema	`T=0.3-0.5`, top_p=0.95, no min_p
pi-agent (terminus_runner shell explore)	trial-and-error commands	`T=0.7-0.8`, top_p=0.95, min_p=0.05

Universal default that loses only ~5% on each:

markdown
temperature=0.5, top_p=0.95, top_k=40, repetition_penalty=1.05

Training methodology — pipeline of 3 stages

Stage 1: Soyuz SFT (ckpt600 — base agent format)

QLoRA r=64 alpha=128 on Qwen/Qwen3.5-4B.

Datasets: AlexWortega/Soyuz-sft + AlexWortega/AgentTrove
Format: Hermes-style JSON tool calls (<tool_call>{"name":...,"arguments":...}</tool_call>)
600 steps total, seq=8K, Muon optimizer for LoRA matrices
Output: ckpt-400, ckpt-600 (intermediate); soup_sum = ckpt400 + ckpt600 (arithmetic merge)

Stage 2: ClawGym continue-train (clawd-100, clawd-200 — openclaw scaffold adaptation)

Continue-train ckpt600 on filtered RUC-AIBOX/ClawGym-Trajectory.

1937 trajectories (filtered ≤16K tokens out of 24.5K)
200 steps, seq=16K, LR=1e-4, AdamW
Hermes chat template + openclaw native tools (read/write/exec/web_search/...)
Output: clawd-100 (mid), clawd-200 (final)

Stage 3: RIFT — own rollouts + reward feedback

True RIFT loss on top of clawd-200:

python
# positive (reward > 0): NLL × reward — weighted SFT
# negative (reward = 0): exp(logp) × negative_scale — unlikelihood

61 trajectories from soup_sum's own ClawGym rollouts (46 pos + 15 neg, reward 0-1)
5 epochs / 80 steps, LR=2e-5
Implementation: compare_offlinegpro/src/trainers/offline_losses.py
Output: clawd-rift ← this model

Repos

Table with columns: Asset, Link, Size
Asset	Link	Size
LoRA adapter	qwen35-4b-clawd-rift	340 MB
Merged bf16	qwen35-4b-clawd-rift-merged	8.4 GB
GGUF (4 quants)	qwen35-4b-clawd-rift-gguf	18.6 GB
Raw evals	qwen35-4b-clawd-rift-evals

GGUF breakdown:

clawd-rift-f16.gguf (7.9 GB, baseline)
clawd-rift-Q8_0.gguf (4.2 GB, near-lossless)
clawd-rift-Q5_K_M.gguf (2.9 GB, recommended)
clawd-rift-Q4_K_M.gguf (2.6 GB, smallest)

W&B training logs: https://wandb.ai/alexwortega/vae-llm-agents

Related: Stage-1-only model (`qwen35-4b-soyuz`)

A cleaner, stronger reference for the Stage-1 base (Soyuz SFT only — no ClawGym, no RIFT) is now available, trained as full bf16 LoRA r=128 (vs QLoRA r=64 here):

Table with columns: Asset, Link
Asset	Link
LoRA r=128 bf16	AlexWortega/qwen35-4b-soyuz
Merged bf16	AlexWortega/qwen35-4b-soyuz-merged

Final eval on Soyuz-clean held-out: loss=0.247, token_acc=0.936. Trained on the cleaned 11-stream subset of AlexWortega/Soyuz-sft at seq=16K, 1 epoch.

Useful if you want only the Hermes-tool-call SFT without the ClawGym/RIFT specialization.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

AlexWortega

Model Tree

Base

Qwen/Qwen3.5-4B

Adapter

this model

Input Modalities

Text

Image

Video

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3.5-4B', dtype=torch.bfloat16, device_map='cuda')
model = PeftModel.from_pretrained(base, 'AlexWortega/qwen35-4b-clawd-rift')
tok = AutoTokenizer.from_pretrained('AlexWortega/qwen35-4b-clawd-rift')

Or with sglang:

bash
python -m sglang.launch_server --model-path Qwen/Qwen3.5-4B \
    --lora-paths clawd-rift=AlexWortega/qwen35-4b-clawd-rift \
    --tool-call-parser hermes

Evaluation results

tbench-2 (89 docker tasks via Pi-style runner)

7/89 (7.9%). Tasks unique to clawd-rift: fix-ocaml-gc, pytorch-model-recovery.

Table with columns: Variant in pipeline, Pass on tbench-2
Variant in pipeline	Pass on tbench-2
ckpt600 (Soyuz SFT only)	7
clawd-100 (+ ClawGym 100 steps)	7
clawd-200 (+ ClawGym 200 steps)	7
clawd-rft (positive-only SFT on rollouts)	6
clawd-rift (true RIFT on rollouts) — this model	7

ClawGym-Bench (200 tasks via openclaw scaffold)

Table with columns: Stat, Value
Stat	Value
mean	0.371
half+ (≥0.5)	80/200 (40%)
perfect (=1.0)	2 (tasks 78, 148)
zero	40

Comparison to RUC-AIBOX ClawGym leaderboard (compact open-weight models):

Table with columns: Model, ClawGym avg
Model	ClawGym avg
Qwen3-32B	33.11
Qwen3-8B	35.02
clawd-rift (this, 4B, QLoRA, 1 GPU)	37.10
Qwen3-30A3B (MoE)	45.11
ClawGym-4B (RUC-AIBOX full SFT)	47.73

Optimal inference parameters

Sampling sweet-spot is scaffold-dependent.

Table with columns: Scaffold, Task type, Optimal sampling
Scaffold	Task type	Optimal sampling
openclaw (ClawGym-style formal spec)	JSON/Markdown to schema	`T=0.3-0.5`, top_p=0.95, no min_p
pi-agent (terminus_runner shell explore)	trial-and-error commands	`T=0.7-0.8`, top_p=0.95, min_p=0.05

Universal default that loses only ~5% on each:

markdown
temperature=0.5, top_p=0.95, top_k=40, repetition_penalty=1.05

Training methodology — pipeline of 3 stages

Stage 1: Soyuz SFT (ckpt600 — base agent format)

QLoRA r=64 alpha=128 on Qwen/Qwen3.5-4B.

Datasets: AlexWortega/Soyuz-sft + AlexWortega/AgentTrove
Format: Hermes-style JSON tool calls (<tool_call>{"name":...,"arguments":...}</tool_call>)
600 steps total, seq=8K, Muon optimizer for LoRA matrices
Output: ckpt-400, ckpt-600 (intermediate); soup_sum = ckpt400 + ckpt600 (arithmetic merge)

Stage 2: ClawGym continue-train (clawd-100, clawd-200 — openclaw scaffold adaptation)

Continue-train ckpt600 on filtered RUC-AIBOX/ClawGym-Trajectory.

1937 trajectories (filtered ≤16K tokens out of 24.5K)
200 steps, seq=16K, LR=1e-4, AdamW
Hermes chat template + openclaw native tools (read/write/exec/web_search/...)
Output: clawd-100 (mid), clawd-200 (final)

Stage 3: RIFT — own rollouts + reward feedback

True RIFT loss on top of clawd-200:

python
# positive (reward > 0): NLL × reward — weighted SFT
# negative (reward = 0): exp(logp) × negative_scale — unlikelihood

61 trajectories from soup_sum's own ClawGym rollouts (46 pos + 15 neg, reward 0-1)
5 epochs / 80 steps, LR=2e-5
Implementation: compare_offlinegpro/src/trainers/offline_losses.py
Output: clawd-rift ← this model

Repos

Table with columns: Asset, Link, Size
Asset	Link	Size
LoRA adapter	qwen35-4b-clawd-rift	340 MB
Merged bf16	qwen35-4b-clawd-rift-merged	8.4 GB
GGUF (4 quants)	qwen35-4b-clawd-rift-gguf	18.6 GB
Raw evals	qwen35-4b-clawd-rift-evals

GGUF breakdown:

clawd-rift-f16.gguf (7.9 GB, baseline)
clawd-rift-Q8_0.gguf (4.2 GB, near-lossless)
clawd-rift-Q5_K_M.gguf (2.9 GB, recommended)
clawd-rift-Q4_K_M.gguf (2.6 GB, smallest)

W&B training logs: https://wandb.ai/alexwortega/vae-llm-agents

Related: Stage-1-only model (`qwen35-4b-soyuz`)

A cleaner, stronger reference for the Stage-1 base (Soyuz SFT only — no ClawGym, no RIFT) is now available, trained as full bf16 LoRA r=128 (vs QLoRA r=64 here):

Table with columns: Asset, Link
Asset	Link
LoRA r=128 bf16	AlexWortega/qwen35-4b-soyuz
Merged bf16	AlexWortega/qwen35-4b-soyuz-merged

Final eval on Soyuz-clean held-out: loss=0.247, token_acc=0.936. Trained on the cleaned 11-stream subset of AlexWortega/Soyuz-sft at seq=16K, 1 epoch.

Useful if you want only the Hermes-tool-call SFT without the ClawGym/RIFT specialization.

qwen35-4b-clawd-rift

README

Usage

Evaluation results

tbench-2 (89 docker tasks via Pi-style runner)

ClawGym-Bench (200 tasks via openclaw scaffold)

Optimal inference parameters

Training methodology — pipeline of 3 stages

Stage 1: Soyuz SFT (ckpt600 — base agent format)

Stage 2: ClawGym continue-train (clawd-100, clawd-200 — openclaw scaffold adaptation)

Stage 3: RIFT — own rollouts + reward feedback

Repos

Related: Stage-1-only model (qwen35-4b-soyuz)

Explore FriendliAI today

README

Usage

Evaluation results

tbench-2 (89 docker tasks via Pi-style runner)

ClawGym-Bench (200 tasks via openclaw scaffold)

Optimal inference parameters

Training methodology — pipeline of 3 stages

Stage 1: Soyuz SFT (ckpt600 — base agent format)

Stage 2: ClawGym continue-train (clawd-100, clawd-200 — openclaw scaffold adaptation)

Stage 3: RIFT — own rollouts + reward feedback

Repos

Related: Stage-1-only model (qwen35-4b-soyuz)

Related: Stage-1-only model (`qwen35-4b-soyuz`)

Related: Stage-1-only model (`qwen35-4b-soyuz`)