Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0How it was made
- Register adapter (v2): QLoRA+DoRA SFT on 1090 Qwen 3.5 72B distilled native Vietnamese instruction examples.
- Knowledge adapter (Arm C): QLoRA+DoRA SFT on 603 Vietnamese MCQs in the exact eval answer format.
- Merge (this model): the two adapters — identical config (r16/α32/DoRA, same 7 target modules) — are
combined in adapter space via rank-concatenation, the exact weighted sum of their LoRA deltas
ΔW = 0.5·ΔW_register + 0.5·ΔW_knowledge(DoRA magnitudes weighted-averaged). No additional training.
Data-mixing the two corpora in one SFT pass gave zero knowledge lift; merging the finished Specialists in weight space recover the full lift while keeping register — the Model-Soups/TIES/DARE result.
Results (self-run harness; VMLU 744-Q val, 4-bit loglik; register judged by Qwen2.5-7B, both orderings)
| Model | VMLU (knowledge) | Register vs VyLinh |
|---|---|---|
| Qwen2.5-3B-Instruct (base) | 52.5 | — |
| Arcee-VyLinh-3B (target) | 53.5 | 50% (bar) |
| Arm A v2 (register only) | 49.1 | 56.7% |
| Arm C (knowledge only) | 56.5 | 40.0% |
| this merge (α0.50) | 55.5 (+2.0) | 53.3% (>50%) |
The VMLU gain is McNemar-significant vs base and broad (47/55 subjects); the MCQ data is 8-gram decontaminated against the VMLU val set (0/744 overlap).
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModeltok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", device_map="auto")model = PeftModel.from_pretrained(base, "<this-repo-id>")
Limitations
- Research artifact, not production-hardened. Register win is modest (n=30, directional).
- Knowledge gain is task-distribution elicitation on Vietnamese academic MCQs, not new facts (knowledge is base-bound).
- 8GB-laptop recipe; evaluated at 4-bit.
Model provider
Sytex
Model tree
Base
Qwen/Qwen2.5-3B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information