qwen36-35b-soyuz-vibeapps-lora API & Inference Endpoint

Qwen3.6-35B-A3B Soyuz+vibeapps LoRA (final: step 1050)

Full-context (max_len=131072) LoRA SFT of Qwen/Qwen3.6-35B-A3B on the CLEANED corpus (AlexWortega/soyuz-vibeapps-fullctx-clean = Soyuz-sft[clean] + vibeapps-chat-fabric satisfied, minus 1,606 repeat-after-success / trivial-flail / redundant-rewrite traces), assistant-only loss mask.

Final artifact: step 1050 / 1266 (~83% of one epoch, cosine LR tail ~1.7e-5), train loss 0.55 -> ~0.25 (plateaued). Run ended here by choice; trainer_state.pt included so training can be resumed (--resume) for the remaining 216 steps if ever desired.

LoRA r=64 alpha=128 dropout=0.05 on 290 text-decoder Linears (attn qkvo + deltanet in_proj_qkv/z + out_proj + shared-expert MLP); fused 3D MoE experts frozen
corpus: 20,267 traces / ~500M tok (median 18k, max 131,072 tok) — agentic coding traces
AdamW lr 1e-4 cosine warmup 3%, micro_bsz 1 x grad_accum 16, bf16
stack: 2x RTX PRO 6000 96GB (moe_split device map: all MoE blocks on GPU1, attention/ deltanet/CE on GPU0), flash-attention 2.8.3, fla fast path (causal-conv1d), grad ckpt + CPU activation offload, Liger-style masked CE

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3.6-35B-A3B', dtype='bfloat16', device_map='auto')
m = PeftModel.from_pretrained(m, 'AlexWortega/qwen36-35b-soyuz-vibeapps-lora')
tok = AutoTokenizer.from_pretrained('Qwen/Qwen3.6-35B-A3B')

Qwen3.6-35B-A3B Soyuz+vibeapps LoRA (final: step 1050)

LoRA r=64 alpha=128 dropout=0.05 on 290 text-decoder Linears (attn qkvo + deltanet in_proj_qkv/z + out_proj + shared-expert MLP); fused 3D MoE experts frozen
corpus: 20,267 traces / ~500M tok (median 18k, max 131,072 tok) — agentic coding traces
AdamW lr 1e-4 cosine warmup 3%, micro_bsz 1 x grad_accum 16, bf16
stack: 2x RTX PRO 6000 96GB (moe_split device map: all MoE blocks on GPU1, attention/ deltanet/CE on GPU0), flash-attention 2.8.3, fla fast path (causal-conv1d), grad ckpt + CPU activation offload, Liger-style masked CE

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3.6-35B-A3B', dtype='bfloat16', device_map='auto')
m = PeftModel.from_pretrained(m, 'AlexWortega/qwen36-35b-soyuz-vibeapps-lora')
tok = AutoTokenizer.from_pretrained('Qwen/Qwen3.6-35B-A3B')

qwen36-35b-soyuz-vibeapps-lora

README

Qwen3.6-35B-A3B Soyuz+vibeapps LoRA (final: step 1050)

Explore FriendliAI today

README

Qwen3.6-35B-A3B Soyuz+vibeapps LoRA (final: step 1050)