Benchmark Results
Table with columns: Bench, Method, Score, Target, Status| Bench | Method | Score | Target | Status |
|---|
| ThaiExam | cons@8 | 64.42% | 75 | ⚠️ -10.58 |
| MATH500 | cons@4 | 71.60% | 60 | ✅ +11.60 |
| AIME24 | cons@8 (base R1 route) | 30.00% | 25 | ✅ +5.00 |
| xCOPA-TH | cons@8 (v17 route) | 81.00% | 80 | ✅ +1.00 |
| Belebele-TH | cons@8 | 87.56% | 80 | ✅ +7.56 |
| OpenThaiEval | cons@16 | 81.06% | 80 | ✅ +1.06 |
| IFEval-TH | proper verifier | 75.37% | 80 | 🟡 -4.63 |
5/7 hard targets met directly + IFEval-TH near + ThaiExam capped at architecture ceiling.
Merge Recipe
merge_method: dare_linear
dtype: bfloat16
tokenizer_source: base
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
slices:
- sources:
- model: Jnx03/kanitakorn-r1d-qwen14b-v11-dare-edu
layer_range: [0, 32]
parameters: {weight: 0.6, density: 0.7}
- model: Qwen/Qwen2.5-14B-Instruct
layer_range: [0, 32]
parameters: {weight: 0.4, density: 0.7}
- sources:
- model: Jnx03/kanitakorn-r1d-qwen14b-v11-dare-edu
layer_range: [32, 48]
parameters: {weight: 0.3, density: 0.7}
- model: Qwen/Qwen2.5-14B-Instruct
layer_range: [32, 48]
parameters: {weight: 0.7, density: 0.7}
Companion Models
Jnx03/kanitakorn-r1d-qwen14b-v17-dpo-r32-edu — DPO r=32 specialist for xCOPA-TH (81%)
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B — base R1 for AIME24 (30%)
Recommended two-model routing for production:
- Thai language, knowledge, instruction-following → v18
- xCOPA causal reasoning → v17
- Hardest math / AIME-style → base R1
Use Case
Primary audience: Thai students (P1-M6, grades 1-12) and teachers. Strong general Thai education across language, math, science, social studies, and reading comprehension.
License: MIT (inherited from DeepSeek-R1-Distill-Qwen-14B)