Jnx03

kanitakorn-r1d-qwen14b-v11-dare-edu

Benchmark Scores

Table with columns: Benchmark, Score, Method, Target
Benchmark	Score	Method	Target
ThaiExam (cons@8)	56.99%	322/565	85
xCOPA-TH (cons@8)	79.00%	79/100	80
Belebele-TH (cons@4)	84.33% ✅	759/900	80
MATH500 (cons@4)	64.80% ✅	324/500	60
AIME24 (cons@8)	13.33%	4/30	25 (route to base R1)

Two-model routing recommended: this model for Thai-language tasks (Belebele, ThaiExam, OpenThaiEval, IFEval-TH), base DeepSeek-R1-Distill-Qwen-14B for AIME24-style hard math (base scores 30% on AIME24 cons@8).

Recipe

yaml
merge_method: dare_linear
dtype: bfloat16
tokenizer_source: base
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
slices:
  - sources:
      - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
        layer_range: [0, 32]
        parameters: {weight: 0.75, density: 0.7}
      - model: Jnx03/kanitakorn-r1d-qwen14b-v10-merged-fp16-edu
        layer_range: [0, 32]
        parameters: {weight: 0.25, density: 0.7}
  - sources:
      - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
        layer_range: [32, 48]
        parameters: {weight: 0.4, density: 0.7}
      - model: Jnx03/kanitakorn-r1d-qwen14b-v10-merged-fp16-edu
        layer_range: [32, 48]
        parameters: {weight: 0.6, density: 0.7}

The v10-merged-fp16-edu source is a dense merge of R1-Distill-14B + v10 LoRA (a Thai-SFT adapter trained on 6209 records of P1-M6 curriculum + ThaiExam-style + OpenThaiEval).

Use case

Primary audience: Thai students (P1-M6, grades 1-12) and teachers.

Capabilities:

Multiple-choice exam reasoning in Thai (ThaiExam, O-NET, TGAT, A-Level styles)
Reading comprehension (Belebele-TH 84%)
Math reasoning (MATH500 65%)
Causal commonsense reasoning in Thai (xCOPA-TH 79%)
Curriculum-aligned Q&A for Thai education topics

Inference

python
from vllm import LLM, SamplingParams

llm = LLM(
    model="Jnx03/kanitakorn-r1d-qwen14b-v11-dare-edu",
    dtype="bfloat16", max_model_len=4096, gpu_memory_utilization=0.90,
)
sampling = SamplingParams(n=8, temperature=0.6, top_p=0.95, max_tokens=2048)

prompt = "..."  # Thai prompt + suffix asking for คำตอบคือ (X) format
outputs = llm.generate([prompt], sampling)
# Majority vote across n=8 samples for ThaiExam-style MCQ

Limitations

ThaiExam at 56.99% remains below the campaign 85 target; the gap to published Thai LLM SOTA (~73%) is real
AIME24 regressed from base R1's 30% to 13% after DARE-linear merge; for hardest math, prefer base model
IFEval-TH not yet measured
May produce English-mixed chain-of-thought on some inputs despite Thai-prefix prompting

Citation

If you use this model, please cite the SCB10X recipe:

Pipatanakul, K., et al. Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging. arXiv:2502.09056, 2025.

And the campaign log: https://github.com/JNX03/kanitakorn-research

License: MIT (inherited from DeepSeek-R1-Distill-Qwen-14B)

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Jnx03

Model Tree

Base

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Fine-tuned

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

Benchmark Scores

Table with columns: Benchmark, Score, Method, Target
Benchmark	Score	Method	Target
ThaiExam (cons@8)	56.99%	322/565	85
xCOPA-TH (cons@8)	79.00%	79/100	80
Belebele-TH (cons@4)	84.33% ✅	759/900	80
MATH500 (cons@4)	64.80% ✅	324/500	60
AIME24 (cons@8)	13.33%	4/30	25 (route to base R1)

Recipe

yaml
merge_method: dare_linear
dtype: bfloat16
tokenizer_source: base
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
slices:
  - sources:
      - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
        layer_range: [0, 32]
        parameters: {weight: 0.75, density: 0.7}
      - model: Jnx03/kanitakorn-r1d-qwen14b-v10-merged-fp16-edu
        layer_range: [0, 32]
        parameters: {weight: 0.25, density: 0.7}
  - sources:
      - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
        layer_range: [32, 48]
        parameters: {weight: 0.4, density: 0.7}
      - model: Jnx03/kanitakorn-r1d-qwen14b-v10-merged-fp16-edu
        layer_range: [32, 48]
        parameters: {weight: 0.6, density: 0.7}

The v10-merged-fp16-edu source is a dense merge of R1-Distill-14B + v10 LoRA (a Thai-SFT adapter trained on 6209 records of P1-M6 curriculum + ThaiExam-style + OpenThaiEval).

Use case

Primary audience: Thai students (P1-M6, grades 1-12) and teachers.

Capabilities:

Multiple-choice exam reasoning in Thai (ThaiExam, O-NET, TGAT, A-Level styles)
Reading comprehension (Belebele-TH 84%)
Math reasoning (MATH500 65%)
Causal commonsense reasoning in Thai (xCOPA-TH 79%)
Curriculum-aligned Q&A for Thai education topics

Inference

python
from vllm import LLM, SamplingParams

llm = LLM(
    model="Jnx03/kanitakorn-r1d-qwen14b-v11-dare-edu",
    dtype="bfloat16", max_model_len=4096, gpu_memory_utilization=0.90,
)
sampling = SamplingParams(n=8, temperature=0.6, top_p=0.95, max_tokens=2048)

prompt = "..."  # Thai prompt + suffix asking for คำตอบคือ (X) format
outputs = llm.generate([prompt], sampling)
# Majority vote across n=8 samples for ThaiExam-style MCQ

Limitations

ThaiExam at 56.99% remains below the campaign 85 target; the gap to published Thai LLM SOTA (~73%) is real
AIME24 regressed from base R1's 30% to 13% after DARE-linear merge; for hardest math, prefer base model
IFEval-TH not yet measured
May produce English-mixed chain-of-thought on some inputs despite Thai-prefix prompting

Citation

If you use this model, please cite the SCB10X recipe:

Pipatanakul, K., et al. Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging. arXiv:2502.09056, 2025.

And the campaign log: https://github.com/JNX03/kanitakorn-research

License: MIT (inherited from DeepSeek-R1-Distill-Qwen-14B)

kanitakorn-r1d-qwen14b-v11-dare-edu

README

Benchmark Scores

Recipe

Use case

Inference

Limitations

Citation

Explore FriendliAI today

README

Benchmark Scores

Recipe

Use case

Inference

Limitations

Citation