Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
Model Description
Qwen3-4B-CPT-Base extends Qwen/Qwen3-4B-Base with continued pre-training on a ~200M-token Indonesian corpus (news, Wikipedia, social media). The goal is Indonesian-domain adaptation as the foundation for downstream SFT. It is a base model: it performs raw text completion and is not tuned for instruction-following or chat. Part of the Model Narasi Isu pipeline (CPT -> SFT -> Deployment) for Indonesian public-issue monitoring and narrative analysis.
- Developed by: AITF UGM 2026
- Model type: Causal decoder-only LLM (continued pre-training)
- Language(s) (NLP): Indonesian (Bahasa Indonesia); English technical terms preserved
- License: Qwen License
- Finetuned from model [optional]: Qwen/Qwen3-4B-Base
Model Sources [optional]
- Repository: https://huggingface.co/aitf-ugm-2026
Uses
Direct Use
Indonesian-domain text completion. Perplexity benchmarking against vanilla Qwen3 baselines.
Downstream Use [optional]
Foundation for supervised fine-tuning (SFT) on Indonesian tasks: summarization, issue narrative analysis (ABSA), dashboard previews, chatbot Q&A.
Out-of-Scope Use
Not for chat or instruction-following before SFT. Not for high-stakes decisions without human review. Not a safety-aligned assistant.
Bias, Risks, and Limitations
Not instruction-tuned: no reliable JSON, chat, or task behavior. Corpus is news-heavy (70%), so outputs may reflect media and social-media biases. Coverage skews to topics present in the corpus window.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Validate outputs; apply SFT before task deployment.
How to Get Started with the Model
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "aitf-ugm-2026/Qwen3-4B-CPT-Base"tok = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")prompt = "Ibu kota Indonesia adalah"ids = tok(prompt, return_tensors="pt").to(model.device)out = model.generate(**ids, max_new_tokens=64)print(tok.decode(out[0], skip_special_tokens=True))
vLLM (use completions endpoint, not chat):
bash
vllm serve aitf-ugm-2026/Qwen3-4B-CPT-Base \--gpu-memory-utilization 0.90 --max-model-len 8192
Training Details
Training Data
~200M tokens, group-aware split (train/val/test = 0.99 / 0.005 / 0.005).
| Source | Share | Tokens |
|---|---|---|
| Berita (news) | 70% | ~140M |
| Wikipedia (id) | 20% | ~40M |
| Sosial media | 10% | ~20M |
| Total | 100% | ~200M |
Train split: 325,860 records / ~198M tokens. Test: 1,655 records (news 1,098 / socmed 191 / wiki 366).
Training Procedure
Preprocessing [optional]
Group-aware train/val/test split to avoid leakage. Sequence packing enabled. Local /content/ processing before Drive copy.
Training Hyperparameters
- Training regime: bf16 mixed precision
- Method: LoRA, RSLoRA enabled
- LoRA rank / alpha: 128 / 256
- Extra modules:
embed_tokens,lm_headincluded - LoRA dropout: 0.0
- Max seq length: 8192
- Packing: True; 4-bit load: False
- Epochs: 1
- Per-device batch: 12; grad accumulation: 16; effective batch: 192
- Learning rate: 1e-5; embedding LR: 5e-6
- Scheduler: cosine; warmup ratio: 0.03
- Optimizer: adamw_8bit; weight decay: 0.01
- Seed: 3407; early stopping enabled
- Save format: merged_16bit
Evaluation
Testing Data, Factors & Metrics
Testing Data
Held-out test set: 1,654 documents (news / socmed / wiki).
Factors
Disaggregated by source domain: news, social media, Wikipedia.
Metrics
Perplexity (lower is better). Eval: ~1M tokens, max_length=4096, stride=1024, bf16 / 4-bit.
Results
| Model | Full | News | Socmed | Wiki |
|---|---|---|---|---|
| Qwen3-4B-CPT-Base (this) | 4.561 | 4.108 | 4.418 | 6.492 |
| Qwen3-4B-Base (vanilla) | 5.930 | 5.389 | 6.438 | 7.757 |
| Improvement | ~23% | ~24% | ~31% | ~16% |
Summary
CPT cuts perplexity ~23% overall vs vanilla Qwen3-4B-Base, and beats vanilla Qwen3-8B-Base on all four subsets. Domain adaptation outweighs raw parameter count for this Indonesian domain. Largest gain on social media (~31%).
Technical Specifications [optional]
Model Architecture and Objective
Qwen3 causal decoder-only transformer. Objective: continued causal language-model pre-training (next-token prediction).
Compute Infrastructure
Hardware
NVIDIA A100 80GB (Google Colab Pro+).
Software
Unsloth, TRL, HuggingFace Transformers, PEFT, bitsandbytes. Monitoring: WandB.
Citation [optional]
BibTeX:
bibtex
@misc{qwen3_4b_cpt_base,title = {Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training},author = {AITF UGM 2026},year = {2026},note = {Model Narasi Isu pipeline}}
APA:
AITF UGM 2026. (2026). Qwen3-4B-CPT-Base: Indonesian Continued Pre-Training. Model Narasi Isu pipeline.
More Information
Model Narasi Isu: Indonesian public-issue monitoring and narrative analysis pipeline.
Model Card Authors
AITF UGM 2026
Model Card Contact
Model provider
aitf-kpm-ugm
Model tree
Base
Qwen/Qwen3-4B-Base
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information