tugot17

qwen3-30b-a3b-peft18-moe-lora-demo

README

License: apache-2.0

Training

Table with columns: Parameter, Value
Parameter	Value
Base model	`Qwen/Qwen3-30B-A3B-Instruct-2507` (bf16; 48 layers, 128 experts, hidden=2048, moe_intermediate=768)
Framework	PEFT 0.19.1 + `trl.SFTTrainer` (transformers ≥ 5.8, for `@use_experts_implementation` on Qwen3MoE)
Dataset	`HuggingFaceTB/smoltalk` (config `all`, `train[:1000]`)
Rank (r)	8
Alpha	16
Dropout	0 (required by PEFT `ParamWrapper`)
Optimizer / schedule	AdamW, lr 5e-5, linear schedule, 10% warmup
Batch	per-device 1 × grad-accum 8 (effective batch 8)
Steps	10 — bf16; deliberately under-trained, this is a loader exerciser, not a quality run
Hardware	single H100 80GB

LoRA targets

Attention (target_modules): q_proj, k_proj, v_proj, o_proj
MoE experts (batched 3D, via target_parameters): experts.down_proj, experts.gate_up_proj

The target_parameters argument is what's new in PEFT 0.18+ — it wraps individual nn.Parameter tensors instead of whole nn.Modules, so you can LoRA the batched 3D expert weights that HF transformers ≥ 5.8 introduced for MoE classes (Qwen3MoeExperts, Lfm2MoeExperts, Mixtral, DeepSeek-V2, etc. — anything decorated with @use_experts_implementation). The expert tensors here have shape (num_experts=128, 2·moe_intermediate, hidden) for gate_up_proj and (num_experts, hidden, moe_intermediate) for down_proj.

Reproducing

The training script is bundled with this repo:

bash
huggingface-cli download tugot17/qwen3-30b-a3b-peft18-moe-lora-demo train_qwen3_moe_demo.py --repo-type model --local-dir .
python train_qwen3_moe_demo.py --output-dir ./outputs/qwen3-30b-a3b-peft18-demo --max-steps 10

Or read it inline: train_qwen3_moe_demo.py. Self-contained — only depends on peft>=0.18, transformers>=5.8, trl, datasets, torch.

Shapes (per layer × 48 layers)

Table with columns: Key, Shape, Leaf
Key	Shape	Leaf
`...mlp.experts.base_layer.lora_A.weight`	`(1024, 2048)` = `(E·r, hidden)`	`gate_up_proj`
`...mlp.experts.base_layer.lora_B.weight`	`(1536, 1024)` = `(2·moe_inter, r·E)`	`gate_up_proj`
`...mlp.experts.lora_A.weight`

num_experts=128, r=8, hidden=2048, moe_intermediate=768. The inner (base_layer) slot holds gate_up_proj regardless of target_parameters order — PEFT wraps in HF-attribute-definition order, not user-config order.

Usage

SGLang

Requires a build with native PEFT 0.18+ batched-MoE LoRA loading.

Launch:

bash
python -m sglang.launch_server \
    --model-path Qwen/Qwen3-30B-A3B-Instruct-2507 \
    --port 30000 \
    --mem-fraction-static 0.85 \
    --dtype bfloat16 \
    --enable-lora --max-lora-rank 8 \
    --lora-paths "qwen3demo=tugot17/qwen3-30b-a3b-peft18-moe-lora-demo" \
    --lora-target-modules q_proj k_proj v_proj o_proj gate_up_proj down_proj

Generate with LoRA:

bash
curl -sS http://localhost:30000/generate \
    -H "Content-Type: application/json" \
    -d '{
      "text": "Write a short story set in an unusual location.",
      "sampling_params": {"temperature": 0.0, "max_new_tokens": 128},
      "lora_path": "qwen3demo"
    }'

Generate without LoRA (omit lora_path):

bash
curl -sS http://localhost:30000/generate \
    -H "Content-Type: application/json" \
    -d '{
      "text": "Write a short story set in an unusual location.",
      "sampling_params": {"temperature": 0.0, "max_new_tokens": 128}
    }'

Sample outputs

Prompt: "Write a short story set in an unusual location." (temperature 0, max_new_tokens 128).

Base (Qwen/Qwen3-30B-A3B-Instruct-2507):

Title: The Library Beneath the Clock

Beneath the cobblestone square of the forgotten town of Hollowwick, where the clock tower had stopped ticking a century ago, there existed a library no map would ever show.

It wasn't built—it was grown. Its shelves spiraled upward like the roots of an ancient tree, carved from petrified wood and glowing faintly with bioluminescent moss. The air hummed with the quiet rustle of pages turning on their own, as if the books themselves were dreaming.

With this LoRA (tugot17/qwen3-30b-a3b-peft18-moe-lora-demo):

In the heart of the Amazon rainforest, where the canopy blots out the sun and the air hangs thick with humidity, there exists a hidden grove known only to the forest's oldest spirits. It is not marked on any map, nor does it appear on satellite images. The grove is a place of silence—no birds sing, no insects buzz, and even the wind seems to hold its breath.

This is where the Library of Forgotten Things is kept.

Greedy decoding diverges immediately and produces a visibly different story — confirming the adapter's MoE expert + attention weights are actually loaded and active, not silently dropped. This is the only quality claim — at 10 training steps, neither answer should be treated as "better" than the base.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

tugot17

Model Tree

Base

Qwen/Qwen3-30B-A3B-Instruct-2507

Adapter

this model

Input Modalities

Text

Output Modalities