ThaiLLM-Dev/openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.4 API & Inference Endpoint

Evaluation

Greedy decoding (temperature=0), general_mcq via EvalScope on local vLLM.

Thai Travel QA v2 (135 hand-curated MCQ — broad tourism knowledge)

Model	Accuracy
qwen3.6-35b (reference, 35B)	83.70%
thaitravel-v0.0.1	82.22%
thaitravel-v0.0.2	80.74%
thaitravel-v0.0.4 (this model)	78.52%
thaitravel-v0.0.3	72.59%

Thai Travel QA v3 (483 Wikipedia-synthetic balanced MCQ)

Model	Accuracy
thaitravel-v0.0.3	54.24%
thaitravel-v0.0.4 (this model)	50.10%

Summary: Relative to v0.0.3, merging the broad corpus back in recovers +5.93 pp on v2 (the meaningful broad-knowledge benchmark) at a −4.14 pp cost on v3 (the narrower Wikipedia-synthetic set). v0.0.4 is the stronger general Thai travel model.

Detailed breakdown (v0.0.4)

An independent clean re-run reproduced these scores within vLLM greedy non-determinism (≤1.5 pp, i.e. ≤3 questions out of 618): v2 77.04%, v3 49.90% — so the headline numbers above are confirmed.

v2 by category: attractions 81.1% (n=53) · culture 78.3% (n=46) · food & drink 69.4% (n=36, weakest)
v2 by answer letter: A 70% · B 81% · C 84% · D 70%
v3 by answer letter: A 50% · B 56% · C 50% · D 44% (answer-balanced set)
Format compliance: v2 127/135 and v3 450/483 outputs emitted a parseable ANSWER: X. Unparseable outputs are scored incorrect — a ~7% drag on v3 that tighter answer extraction could recover.

Training

Base model: OpenThaiGPT-ThaiLLM-8B-ThaiKnowledge-v7.2
Method: LoRA — rank 64, alpha 128, dropout 0.05, target_modules=all-linear
Optimizer: AdamW (fused), lr 1e-4, cosine schedule, warmup 5%, weight decay 0.1
Schedule: 3 epochs, max_length 4096, effective batch size 8
Hardware: 4× H100 80 GB (DDP)
Framework: ms-swift
Training data: 19,847 instruction pairs — a broad curated Thai travel corpus merged with Wikipedia-tourism synthetic Q/A. Deduplicated by normalized question and leak-checked against both evaluation sets (0 leaks).

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ThaiLLM-Dev/openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.4"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

messages = [{"role": "user", "content": "แนะนำสถานที่ท่องเที่ยวในจังหวัดเชียงใหม่"}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.4

Get help setting up a custom Dedicated Endpoints.

README