Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Evaluation
Greedy decoding (temperature=0), general_mcq via EvalScope on local vLLM.
Thai Travel QA v2 (135 hand-curated MCQ — broad tourism knowledge)
| Model | Accuracy |
|---|---|
| qwen3.6-35b (reference, 35B) | 83.70% |
| thaitravel-v0.0.1 | 82.22% |
| thaitravel-v0.0.2 | 80.74% |
| thaitravel-v0.0.4 (this model) | 78.52% |
| thaitravel-v0.0.3 | 72.59% |
Thai Travel QA v3 (483 Wikipedia-synthetic balanced MCQ)
| Model | Accuracy |
|---|---|
| thaitravel-v0.0.3 | 54.24% |
| thaitravel-v0.0.4 (this model) | 50.10% |
Summary: Relative to v0.0.3, merging the broad corpus back in recovers +5.93 pp on v2 (the meaningful broad-knowledge benchmark) at a −4.14 pp cost on v3 (the narrower Wikipedia-synthetic set). v0.0.4 is the stronger general Thai travel model.
Detailed breakdown (v0.0.4)
An independent clean re-run reproduced these scores within vLLM greedy non-determinism (≤1.5 pp, i.e. ≤3 questions out of 618): v2 77.04%, v3 49.90% — so the headline numbers above are confirmed.
- v2 by category: attractions 81.1% (n=53) · culture 78.3% (n=46) · food & drink 69.4% (n=36, weakest)
- v2 by answer letter: A 70% · B 81% · C 84% · D 70%
- v3 by answer letter: A 50% · B 56% · C 50% · D 44% (answer-balanced set)
- Format compliance: v2 127/135 and v3 450/483 outputs emitted a parseable
ANSWER: X. Unparseable outputs are scored incorrect — a ~7% drag on v3 that tighter answer extraction could recover.
Training
- Base model:
OpenThaiGPT-ThaiLLM-8B-ThaiKnowledge-v7.2 - Method: LoRA — rank 64, alpha 128, dropout 0.05,
target_modules=all-linear - Optimizer: AdamW (fused), lr 1e-4, cosine schedule, warmup 5%, weight decay 0.1
- Schedule: 3 epochs, max_length 4096, effective batch size 8
- Hardware: 4× H100 80 GB (DDP)
- Framework: ms-swift
- Training data: 19,847 instruction pairs — a broad curated Thai travel corpus merged with Wikipedia-tourism synthetic Q/A. Deduplicated by normalized question and leak-checked against both evaluation sets (0 leaks).
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "ThaiLLM-Dev/openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.4"tok = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")messages = [{"role": "user", "content": "แนะนำสถานที่ท่องเที่ยวในจังหวัดเชียงใหม่"}]inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)out = model.generate(inputs, max_new_tokens=512, do_sample=False)print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Model provider
ThaiLLM-Dev
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information