srallaba
tarak-qwen3.6-35b-a3b-rsi-v1
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Method (novel)
The 256 routed experts are stored as fused tensors (not per-expert nn.Linear), so standard
per-expert PEFT is impossible. We instead freeze all 256 routed experts and the router
(Chinese capacity and routing untouched) and LoRA-adapt only the shared-expert + attention
projections (7 module types, ~8.4M params, 0.024%). A routing profile over hi/te vs zh/en confirmed
Indic specialization concentrates in the final + early script layers. Trained with continued
pretraining on IndicCorpV2/Sangraha Hindi+Telugu (CC0/CC-BY) plus English+Chinese replay.
- Base:
Qwen/Qwen3.6-35B-A3B - Trainable: shared-expert (
gate/up/down_proj) + attention (q/k/v/o_proj), rsLoRA r=16 - Compute: 1× NVIDIA GB10 (DGX Spark), bf16
Results (Cognitive-Lab Indic LLM Leaderboard tasks, our eval)
Net-neutral vs the strong base on MCQ (base is already near-ceiling): Hindi macro ~82%, Telugu macro ~70%, Chinese retention preserved. The contribution is the method (routing-steered, Chinese-frozen Indic adaptation of a fused-expert MoE), not a large MCQ delta.
Usage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B", torch_dtype="bfloat16", device_map="auto")model = PeftModel.from_pretrained(base, "srallaba/tarak-qwen3.6-35b-a3b-rsi-v1")
Project Tarak — Indic-language LLM exploration.
Model provider
srallaba
Model tree
Base
Qwen/Qwen3.6-35B-A3B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information