Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Evaluation

Greedy decoding (temperature=0), general_mcq via EvalScope on local vLLM.

Thai Travel QA v2 (135 hand-curated MCQ — broad tourism knowledge)

ModelAccuracy
qwen3.6-35b (reference, 35B)83.70%
thaitravel-v0.0.182.22%
thaitravel-v0.0.280.74%
thaitravel-v0.0.4 (this model)78.52%
thaitravel-v0.0.372.59%

Thai Travel QA v3 (483 Wikipedia-synthetic balanced MCQ)

ModelAccuracy
thaitravel-v0.0.354.24%
thaitravel-v0.0.4 (this model)50.10%

Summary: Relative to v0.0.3, merging the broad corpus back in recovers +5.93 pp on v2 (the meaningful broad-knowledge benchmark) at a −4.14 pp cost on v3 (the narrower Wikipedia-synthetic set). v0.0.4 is the stronger general Thai travel model.

Detailed breakdown (v0.0.4)

An independent clean re-run reproduced these scores within vLLM greedy non-determinism (≤1.5 pp, i.e. ≤3 questions out of 618): v2 77.04%, v3 49.90% — so the headline numbers above are confirmed.

  • v2 by category: attractions 81.1% (n=53) · culture 78.3% (n=46) · food & drink 69.4% (n=36, weakest)
  • v2 by answer letter: A 70% · B 81% · C 84% · D 70%
  • v3 by answer letter: A 50% · B 56% · C 50% · D 44% (answer-balanced set)
  • Format compliance: v2 127/135 and v3 450/483 outputs emitted a parseable ANSWER: X. Unparseable outputs are scored incorrect — a ~7% drag on v3 that tighter answer extraction could recover.

Training

  • Base model: OpenThaiGPT-ThaiLLM-8B-ThaiKnowledge-v7.2
  • Method: LoRA — rank 64, alpha 128, dropout 0.05, target_modules=all-linear
  • Optimizer: AdamW (fused), lr 1e-4, cosine schedule, warmup 5%, weight decay 0.1
  • Schedule: 3 epochs, max_length 4096, effective batch size 8
  • Hardware: 4× H100 80 GB (DDP)
  • Framework: ms-swift
  • Training data: 19,847 instruction pairs — a broad curated Thai travel corpus merged with Wikipedia-tourism synthetic Q/A. Deduplicated by normalized question and leak-checked against both evaluation sets (0 leaks).

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ThaiLLM-Dev/openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.4"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")
messages = [{"role": "user", "content": "แนะนำสถานที่ท่องเที่ยวในจังหวัดเชียงใหม่"}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Model provider

ThaiLLM-Dev

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today