Thorstin/gpt2-dutch-instruct API & Inference Endpoint

Model details

Property	Value
Architecture	GPT-2 small
Parameters	123.8M
Layers	12
Attention heads	12
Hidden dimension	768
Context length	512 tokens
Vocabulary size	50,000 (Dutch BPE)
Weights	fp16 / safetensors (473 MB)
Inference speed (CPU)	0.9 tok/s

Files

File	Format	Size
`model.safetensors`	fp16	473 MB
`dutch-gpt2-f16.gguf`	GGUF F16	249 MB
`dutch-gpt2-q8_0.gguf`	GGUF Q8_0	132 MB

Use with llama.cpp

bash
# Download
wget https://huggingface.co/Thorstin/gpt2-dutch-instruct/resolve/main/dutch-gpt2-q8_0.gguf

# Run
llama-cli -m dutch-gpt2-q8_0.gguf \
  -p "### Instructie:\nWat is de hoofdstad van Nederland?\n### Antwoord:\n" \
  -n 200

Use with Ollama

bash
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./dutch-gpt2-q8_0.gguf
TEMPLATE """### Instructie:
{{ .Prompt }}
### Antwoord:
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.3
PARAMETER num_ctx 512
EOF

ollama create dutch-gpt2 -f Modelfile
ollama run dutch-gpt2

Training

Phase 1 — Pretraining from scratch

Dataset: CC-100 Dutch (~37 GB raw, ~6.6B tokens), streamed
Tokenizer: ByteLevel BPE trained on first 500K CC-100 Dutch documents
Hardware: NVIDIA Tesla T4 (16 GB VRAM)
Tokens trained: ~5B
Steps: 154,000
Final loss: 3.54
Duration: ~70 GPU hours
Key settings: fp16=True, gradient_checkpointing=True, batch_size=32, lr=5e-4, cosine scheduler

Phase 2 — Instruction fine-tuning (SFT)

Dataset: BramVanroy/alpaca-cleaned-dutch — 46,163 Dutch instruction/response pairs
Framework: TRL 1.6.0 SFTTrainer
Epochs: 3
Steps: 4,329
Loss: 3.31 → 1.14
Duration: ~1.25 hours

Instruction format

markdown
### Instructie:
<vraag of instructie>
### Antwoord:
<antwoord>

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Thorstin/gpt2-dutch-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)
model.eval()

def chat(instruction: str, max_new_tokens: int = 200) -> str:
    prompt = f"### Instructie:\n{instruction}\n### Antwoord:\n"
    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            repetition_penalty=1.3,
            pad_token_id=tokenizer.eos_token_id,
        )
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response.split("### Antwoord:")[-1].strip()

print(chat("Wat is de hoofdstad van Nederland?"))

Benchmark results (lm-evaluation-harness, limit=200)

Task	Accuracy	Accuracy (norm)
hellaswag_nl	24.50%	28.50%
arc_nl	19.00%	29.00%
blimp_nl	80.67%	79.51%

Random baseline: 50% for BLiMP-NL (binary), 25% for HellaSwag/ARC (4-way).

Sample outputs

Prompt	Response
Wat is de hoofdstad van Nederland?	De hoofdstad van Nederland is Amsterdam....
Leg uit wat fotosynthese is.	Fotosynthese is het proces waarbij planten lichtenergie van de zon omzetten in chemische energie die ze gebruiken om koo...
Schrijf een kort gedicht over de zee.	De golven slaan tegen het raam, Een kalmerende bries draagt de geur van zout en vers gezette koffie. Het geluid van gebr...

Limitations

124M parameters is a hard ceiling — expect occasional repetition, factual errors, and shorter coherent responses compared to larger models
Context window is limited to 512 tokens

Framework versions

Package	Version
TRL	1.6.0
Transformers	4.48
PyTorch	2.9.1+cu128
Datasets	2.16
Tokenizers	0.21

gpt2-dutch-instruct

Get help setting up a custom Dedicated Endpoints.

README

Model details

Files

Use with llama.cpp

Use with Ollama

Training

Phase 1 — Pretraining from scratch

Phase 2 — Instruction fine-tuning (SFT)

Instruction format

Usage

Benchmark results (lm-evaluation-harness, limit=200)

Sample outputs

Limitations

Framework versions

Explore FriendliAI today

gpt2-dutch-instruct