Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

Model Description

Qwen3-4B-CPT-Base extends Qwen/Qwen3-4B-Base with continued pre-training on a ~200M-token Indonesian corpus (news, Wikipedia, social media). The goal is Indonesian-domain adaptation as the foundation for downstream SFT. It is a base model: it performs raw text completion and is not tuned for instruction-following or chat. Part of the Model Narasi Isu pipeline (CPT -> SFT -> Deployment) for Indonesian public-issue monitoring and narrative analysis.

  • Developed by: AITF UGM 2026
  • Model type: Causal decoder-only LLM (continued pre-training)
  • Language(s) (NLP): Indonesian (Bahasa Indonesia); English technical terms preserved
  • License: Qwen License
  • Finetuned from model [optional]: Qwen/Qwen3-4B-Base

Model Sources [optional]

Uses

Direct Use

Indonesian-domain text completion. Perplexity benchmarking against vanilla Qwen3 baselines.

Downstream Use [optional]

Foundation for supervised fine-tuning (SFT) on Indonesian tasks: summarization, issue narrative analysis (ABSA), dashboard previews, chatbot Q&A.

Out-of-Scope Use

Not for chat or instruction-following before SFT. Not for high-stakes decisions without human review. Not a safety-aligned assistant.

Bias, Risks, and Limitations

Not instruction-tuned: no reliable JSON, chat, or task behavior. Corpus is news-heavy (70%), so outputs may reflect media and social-media biases. Coverage skews to topics present in the corpus window.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Validate outputs; apply SFT before task deployment.

How to Get Started with the Model

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "aitf-ugm-2026/Qwen3-4B-CPT-Base"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
prompt = "Ibu kota Indonesia adalah"
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))

vLLM (use completions endpoint, not chat):

bash

vllm serve aitf-ugm-2026/Qwen3-4B-CPT-Base \
--gpu-memory-utilization 0.90 --max-model-len 8192

Training Details

Training Data

~200M tokens, group-aware split (train/val/test = 0.99 / 0.005 / 0.005).

SourceShareTokens
Berita (news)70%~140M
Wikipedia (id)20%~40M
Sosial media10%~20M
Total100%~200M

Train split: 325,860 records / ~198M tokens. Test: 1,655 records (news 1,098 / socmed 191 / wiki 366).

Training Procedure

Preprocessing [optional]

Group-aware train/val/test split to avoid leakage. Sequence packing enabled. Local /content/ processing before Drive copy.

Training Hyperparameters

  • Training regime: bf16 mixed precision
  • Method: LoRA, RSLoRA enabled
  • LoRA rank / alpha: 128 / 256
  • Extra modules: embed_tokens, lm_head included
  • LoRA dropout: 0.0
  • Max seq length: 8192
  • Packing: True; 4-bit load: False
  • Epochs: 1
  • Per-device batch: 12; grad accumulation: 16; effective batch: 192
  • Learning rate: 1e-5; embedding LR: 5e-6
  • Scheduler: cosine; warmup ratio: 0.03
  • Optimizer: adamw_8bit; weight decay: 0.01
  • Seed: 3407; early stopping enabled
  • Save format: merged_16bit

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out test set: 1,654 documents (news / socmed / wiki).

Factors

Disaggregated by source domain: news, social media, Wikipedia.

Metrics

Perplexity (lower is better). Eval: ~1M tokens, max_length=4096, stride=1024, bf16 / 4-bit.

Results

ModelFullNewsSocmedWiki
Qwen3-4B-CPT-Base (this)4.5614.1084.4186.492
Qwen3-4B-Base (vanilla)5.9305.3896.4387.757
Improvement~23%~24%~31%~16%

Summary

CPT cuts perplexity ~23% overall vs vanilla Qwen3-4B-Base, and beats vanilla Qwen3-8B-Base on all four subsets. Domain adaptation outweighs raw parameter count for this Indonesian domain. Largest gain on social media (~31%).

Technical Specifications [optional]

Model Architecture and Objective

Qwen3 causal decoder-only transformer. Objective: continued causal language-model pre-training (next-token prediction).

Compute Infrastructure

Hardware

NVIDIA A100 80GB (Google Colab Pro+).

Software

Unsloth, TRL, HuggingFace Transformers, PEFT, bitsandbytes. Monitoring: WandB.

Citation [optional]

BibTeX:

bibtex

@misc{qwen3_4b_cpt_base,
title = {Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training},
author = {AITF UGM 2026},
year = {2026},
note = {Model Narasi Isu pipeline}
}

APA:

AITF UGM 2026. (2026). Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training. Model Narasi Isu pipeline.

More Information

Model Narasi Isu: Indonesian public-issue monitoring and narrative analysis pipeline.

Model Card Authors

AITF UGM 2026

Model Card Contact

https://huggingface.co/aitf-ugm-2026

Model provider

aitf-kpm-ugm

Model tree

Base

Qwen/Qwen3-4B-Base

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today