ThaiLLM-Dev

openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.5

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Evaluation

Greedy decoding (temperature=0), general_mcq via EvalScope on local vLLM. "Harness" is the strict-parser score; "corrected" additionally credits answers the model clearly emitted in a non-standard form (Thai letter, "the answer is X") via a transparent Thai-aware re-scorer.

Thai Travel QA v2 (135 hand-curated MCQ — broad tourism knowledge)

Table
ModelAccuracy
qwen3.6-35b (reference, 35B)83.70%
thaitravel-v0.0.182.22%
thaitravel-v0.0.5 (this model)86.67%
thaitravel-v0.0.280.74%
thaitravel-v0.0.478.52%
thaitravel-v0.0.372.59%

Thai Travel QA v3 (483 Wikipedia-synthetic balanced MCQ)

Table
ModelAccuracy
thaitravel-v0.0.5 (this model)57.35%
thaitravel-v0.0.354.24%
thaitravel-v0.0.450.10%

Detailed breakdown (v0.0.5)

  • v2 — harness 86.67%, corrected 86.67% · format compliance 135/135 · by gold letter A 85% / B 93% / C 84% / D 80%
  • v3 — harness 57.35%, corrected 57.35% · format compliance 483/483 · by gold letter A 55% / B 57% / C 54% / D 63%
  • v2 by category: แหล่งท่องเที่ยว 90.6% (n=53) · วัฒนธรรมและประเพณี 93.5% (n=46) · อาหารและเครื่องดื่ม 72.2% (n=36)

Honest note on the ceiling. 90% on both sets is not attainable for an 8B here: even the 35B reference scores 83.7% on v2, and v3 is a held-out generalization test (its questions come from Wikipedia articles deliberately excluded from training). v0.0.5 instead maximizes both honestly — fixing the parse/position losses and broadening knowledge — without any training on the test data.

Training

  • Base model: OpenThaiGPT-ThaiLLM-8B-ThaiKnowledge-v7.2
  • Method: LoRA — rank 64, alpha 128, dropout 0.05, target_modules=all-linear
  • Optimizer: AdamW (fused), lr 1e-4, cosine, warmup 5%, weight decay 0.1
  • Schedule: 3 epochs, max_length 4096, effective batch size 8
  • Hardware: 4× H100 80 GB (DDP)
  • Framework: ms-swift
  • Training data: 43,817 instruction pairs — the v0.0.4 corpus plus new landmark/food/geography Q/A (TAT, ChillPaiNai, PaiDuayKan) and a 3,000-example format/position-debias MCQ slice (balanced A/B/C/D, always ending ANSWER: X). Deduplicated and leak-checked against both evaluation sets (0 leaks).

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ThaiLLM-Dev/openthaigpt-thaillm-8b-instruct-thaitravel-v0.0.5"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")
messages = [{"role": "user", "content": "แนะนำสถานที่ท่องเที่ยวในจังหวัดเชียงใหม่"}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Model provider

ThaiLLM-Dev

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today