Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

How it was made

  1. Register adapter (v2): QLoRA+DoRA SFT on 1090 Qwen 3.5 72B distilled native Vietnamese instruction examples.
  2. Knowledge adapter (Arm C): QLoRA+DoRA SFT on 603 Vietnamese MCQs in the exact eval answer format.
  3. Merge (this model): the two adapters — identical config (r16/α32/DoRA, same 7 target modules) — are combined in adapter space via rank-concatenation, the exact weighted sum of their LoRA deltas ΔW = 0.5·ΔW_register + 0.5·ΔW_knowledge (DoRA magnitudes weighted-averaged). No additional training.

Data-mixing the two corpora in one SFT pass gave zero knowledge lift; merging the finished Specialists in weight space recover the full lift while keeping register — the Model-Soups/TIES/DARE result.

Results (self-run harness; VMLU 744-Q val, 4-bit loglik; register judged by Qwen2.5-7B, both orderings)

ModelVMLU (knowledge)Register vs VyLinh
Qwen2.5-3B-Instruct (base)52.5
Arcee-VyLinh-3B (target)53.550% (bar)
Arm A v2 (register only)49.156.7%
Arm C (knowledge only)56.540.0%
this merge (α0.50)55.5 (+2.0)53.3% (>50%)

The VMLU gain is McNemar-significant vs base and broad (47/55 subjects); the MCQ data is 8-gram decontaminated against the VMLU val set (0/744 overlap).

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(base, "<this-repo-id>")

Limitations

  • Research artifact, not production-hardened. Register win is modest (n=30, directional).
  • Knowledge gain is task-distribution elicitation on Vietnamese academic MCQs, not new facts (knowledge is base-bound).
  • 8GB-laptop recipe; evaluated at 4-bit.

Model provider

Sytex

Model tree

Base

Qwen/Qwen2.5-3B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today