ThaiLLM-Dev
openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.5
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Evaluation
Greedy decoding (temperature=0), general_mcq via EvalScope on local vLLM.
"Harness" is the strict-parser score; "corrected" additionally credits answers the
model clearly emitted in a non-standard form (Thai letter, "the answer is X") via a
transparent Thai-aware re-scorer.
Thai Travel QA v2 (135 hand-curated MCQ — broad tourism knowledge)
| Model | Accuracy |
|---|---|
| qwen3.6-35b (reference, 35B) | 83.70% |
| thaitravel-v0.0.1 | 82.22% |
| thaitravel-v0.0.5 (this model) | 86.67% |
| thaitravel-v0.0.2 | 80.74% |
| thaitravel-v0.0.4 | 78.52% |
| thaitravel-v0.0.3 | 72.59% |
Thai Travel QA v3 (483 Wikipedia-synthetic balanced MCQ)
| Model | Accuracy |
|---|---|
| thaitravel-v0.0.5 (this model) | 57.35% |
| thaitravel-v0.0.3 | 54.24% |
| thaitravel-v0.0.4 | 50.10% |
Detailed breakdown (v0.0.5)
- v2 — harness 86.67%, corrected 86.67% · format compliance 135/135 · by gold letter A 85% / B 93% / C 84% / D 80%
- v3 — harness 57.35%, corrected 57.35% · format compliance 483/483 · by gold letter A 55% / B 57% / C 54% / D 63%
- v2 by category: แหล่งท่องเที่ยว 90.6% (n=53) · วัฒนธรรมและประเพณี 93.5% (n=46) · อาหารและเครื่องดื่ม 72.2% (n=36)
Honest note on the ceiling. 90% on both sets is not attainable for an 8B here: even the 35B reference scores 83.7% on v2, and v3 is a held-out generalization test (its questions come from Wikipedia articles deliberately excluded from training). v0.0.5 instead maximizes both honestly — fixing the parse/position losses and broadening knowledge — without any training on the test data.
Training
- Base model:
OpenThaiGPT-ThaiLLM-8B-ThaiKnowledge-v7.2 - Method: LoRA — rank 64, alpha 128, dropout 0.05,
target_modules=all-linear - Optimizer: AdamW (fused), lr 1e-4, cosine, warmup 5%, weight decay 0.1
- Schedule: 3 epochs, max_length 4096, effective batch size 8
- Hardware: 4× H100 80 GB (DDP)
- Framework: ms-swift
- Training data: 43,817 instruction pairs — the v0.0.4 corpus plus new
landmark/food/geography Q/A (TAT, ChillPaiNai, PaiDuayKan) and a 3,000-example
format/position-debias MCQ slice (balanced A/B/C/D, always ending
ANSWER: X). Deduplicated and leak-checked against both evaluation sets (0 leaks).
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "ThaiLLM-Dev/openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.5"tok = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")messages = [{"role": "user", "content": "แนะนำสถานที่ท่องเที่ยวในจังหวัดเชียงใหม่"}]inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)out = model.generate(inputs, max_new_tokens=512, do_sample=False)print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Model provider
ThaiLLM-Dev
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information