aitf-kpm-ugm

Qwen3-4B-CPT-Base

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model Details

Model Description

Qwen3-4B-CPT-Base extends Qwen/Qwen3-4B-Base with continued pre-training on a ~200M-token Indonesian corpus (news, Wikipedia, social media). The goal is Indonesian-domain adaptation as the foundation for downstream SFT. It is a base model: it performs raw text completion and is not tuned for instruction-following or chat. Part of the Model Narasi Isu pipeline (CPT -> SFT -> Deployment) for Indonesian public-issue monitoring and narrative analysis.

Developed by: AITF UGM 2026
Model type: Causal decoder-only LLM (continued pre-training)
Language(s) (NLP): Indonesian (Bahasa Indonesia); English technical terms preserved
License: Qwen License
Finetuned from model [optional]: Qwen/Qwen3-4B-Base

Model Sources [optional]

Repository: https://huggingface.co/aitf-ugm-2026

Uses

Direct Use

Indonesian-domain text completion. Perplexity benchmarking against vanilla Qwen3 baselines.

Downstream Use [optional]

Foundation for supervised fine-tuning (SFT) on Indonesian tasks: summarization, issue narrative analysis (ABSA), dashboard previews, chatbot Q&A.

Out-of-Scope Use

Not for chat or instruction-following before SFT. Not for high-stakes decisions without human review. Not a safety-aligned assistant.

Bias, Risks, and Limitations

Not instruction-tuned: no reliable JSON, chat, or task behavior. Corpus is news-heavy (70%), so outputs may reflect media and social-media biases. Coverage skews to topics present in the corpus window.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Validate outputs; apply SFT before task deployment.

How to Get Started with the Model

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "aitf-ugm-2026/Qwen3-4B-CPT-Base"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

prompt = "Ibu kota Indonesia adalah"
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))

vLLM (use completions endpoint, not chat):

bash
vllm serve aitf-ugm-2026/Qwen3-4B-CPT-Base \
  --gpu-memory-utilization 0.90 --max-model-len 8192

Training Details

Training Data

~200M tokens, group-aware split (train/val/test = 0.99 / 0.005 / 0.005).

Table with columns: Source, Share, Tokens
Source	Share	Tokens
Berita (news)	70%	~140M
Wikipedia (id)	20%	~40M
Sosial media	10%	~20M
Total	100%	~200M

Train split: 325,860 records / ~198M tokens. Test: 1,655 records (news 1,098 / socmed 191 / wiki 366).

Training Procedure

Preprocessing [optional]

Group-aware train/val/test split to avoid leakage. Sequence packing enabled. Local /content/ processing before Drive copy.

Training Hyperparameters

Training regime: bf16 mixed precision
Method: LoRA, RSLoRA enabled
LoRA rank / alpha: 128 / 256
Extra modules: embed_tokens, lm_head included
LoRA dropout: 0.0
Max seq length: 8192
Packing: True; 4-bit load: False
Epochs: 1
Per-device batch: 12; grad accumulation: 16; effective batch: 192
Learning rate: 1e-5; embedding LR: 5e-6
Scheduler: cosine; warmup ratio: 0.03
Optimizer: adamw_8bit; weight decay: 0.01
Seed: 3407; early stopping enabled
Save format: merged_16bit

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out test set: 1,654 documents (news / socmed / wiki).

Factors

Disaggregated by source domain: news, social media, Wikipedia.

Metrics

Perplexity (lower is better). Eval: ~1M tokens, max_length=4096, stride=1024, bf16 / 4-bit.

Results

Table with columns: Model, Full, News, Socmed, Wiki
Model	Full	News	Socmed	Wiki
Qwen3-4B-CPT-Base (this)	4.561	4.108	4.418	6.492
Qwen3-4B-Base (vanilla)	5.930	5.389	6.438	7.757
Improvement	~23%	~24%	~31%	~16%

Summary

CPT cuts perplexity ~23% overall vs vanilla Qwen3-4B-Base, and beats vanilla Qwen3-8B-Base on all four subsets. Domain adaptation outweighs raw parameter count for this Indonesian domain. Largest gain on social media (~31%).

Technical Specifications [optional]

Model Architecture and Objective

Qwen3 causal decoder-only transformer. Objective: continued causal language-model pre-training (next-token prediction).

Compute Infrastructure

Hardware

NVIDIA A100 80GB (Google Colab Pro+).

Software

Unsloth, TRL, HuggingFace Transformers, PEFT, bitsandbytes. Monitoring: WandB.

Citation [optional]

BibTeX:

bibtex
@misc{qwen3_4b_cpt_base,
  title  = {Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training},
  author = {AITF UGM 2026},
  year   = {2026},
  note   = {Model Narasi Isu pipeline}
}

APA:

AITF UGM 2026. (2026). Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training. Model Narasi Isu pipeline.

More Information

Model Narasi Isu: Indonesian public-issue monitoring and narrative analysis pipeline.

Model Card Authors

AITF UGM 2026

Model Card Contact

https://huggingface.co/aitf-ugm-2026

Model provider

aitf-kpm-ugm

Model tree

Base

Qwen/Qwen3-4B-Base

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model Details

Model Description

Developed by: AITF UGM 2026
Model type: Causal decoder-only LLM (continued pre-training)
Language(s) (NLP): Indonesian (Bahasa Indonesia); English technical terms preserved
License: Qwen License
Finetuned from model [optional]: Qwen/Qwen3-4B-Base

Model Sources [optional]

Repository: https://huggingface.co/aitf-ugm-2026

Uses

Direct Use

Indonesian-domain text completion. Perplexity benchmarking against vanilla Qwen3 baselines.

Downstream Use [optional]

Foundation for supervised fine-tuning (SFT) on Indonesian tasks: summarization, issue narrative analysis (ABSA), dashboard previews, chatbot Q&A.

Out-of-Scope Use

Not for chat or instruction-following before SFT. Not for high-stakes decisions without human review. Not a safety-aligned assistant.

Bias, Risks, and Limitations

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Validate outputs; apply SFT before task deployment.

How to Get Started with the Model

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "aitf-ugm-2026/Qwen3-4B-CPT-Base"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

prompt = "Ibu kota Indonesia adalah"
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))

vLLM (use completions endpoint, not chat):

bash
vllm serve aitf-ugm-2026/Qwen3-4B-CPT-Base \
  --gpu-memory-utilization 0.90 --max-model-len 8192

Training Details

Training Data

~200M tokens, group-aware split (train/val/test = 0.99 / 0.005 / 0.005).

Table with columns: Source, Share, Tokens
Source	Share	Tokens
Berita (news)	70%	~140M
Wikipedia (id)	20%	~40M
Sosial media	10%	~20M
Total	100%	~200M

Train split: 325,860 records / ~198M tokens. Test: 1,655 records (news 1,098 / socmed 191 / wiki 366).

Training Procedure

Preprocessing [optional]

Group-aware train/val/test split to avoid leakage. Sequence packing enabled. Local /content/ processing before Drive copy.

Training Hyperparameters

Training regime: bf16 mixed precision
Method: LoRA, RSLoRA enabled
LoRA rank / alpha: 128 / 256
Extra modules: embed_tokens, lm_head included
LoRA dropout: 0.0
Max seq length: 8192
Packing: True; 4-bit load: False
Epochs: 1
Per-device batch: 12; grad accumulation: 16; effective batch: 192
Learning rate: 1e-5; embedding LR: 5e-6
Scheduler: cosine; warmup ratio: 0.03
Optimizer: adamw_8bit; weight decay: 0.01
Seed: 3407; early stopping enabled
Save format: merged_16bit

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out test set: 1,654 documents (news / socmed / wiki).

Factors

Disaggregated by source domain: news, social media, Wikipedia.

Metrics

Perplexity (lower is better). Eval: ~1M tokens, max_length=4096, stride=1024, bf16 / 4-bit.

Results

Table with columns: Model, Full, News, Socmed, Wiki
Model	Full	News	Socmed	Wiki
Qwen3-4B-CPT-Base (this)	4.561	4.108	4.418	6.492
Qwen3-4B-Base (vanilla)	5.930	5.389	6.438	7.757
Improvement	~23%	~24%	~31%	~16%

Summary

Technical Specifications [optional]

Model Architecture and Objective

Qwen3 causal decoder-only transformer. Objective: continued causal language-model pre-training (next-token prediction).

Compute Infrastructure

Hardware

NVIDIA A100 80GB (Google Colab Pro+).

Software

Unsloth, TRL, HuggingFace Transformers, PEFT, bitsandbytes. Monitoring: WandB.

Citation [optional]

BibTeX:

bibtex
@misc{qwen3_4b_cpt_base,
  title  = {Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training},
  author = {AITF UGM 2026},
  year   = {2026},
  note   = {Model Narasi Isu pipeline}
}

APA:

AITF UGM 2026. (2026). Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training. Model Narasi Isu pipeline.

More Information

Model Narasi Isu: Indonesian public-issue monitoring and narrative analysis pipeline.

Model Card Authors

AITF UGM 2026

Model Card Contact

https://huggingface.co/aitf-ugm-2026