Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model details
| Property | Value |
|---|---|
| Architecture | GPT-2 small |
| Parameters | 123.8M |
| Layers | 12 |
| Attention heads | 12 |
| Hidden dimension | 768 |
| Context length | 512 tokens |
| Vocabulary size | 50,000 (Dutch BPE) |
| Weights | fp16 / safetensors (473 MB) |
| Inference speed (CPU) | 0.9 tok/s |
Files
| File | Format | Size |
|---|---|---|
model.safetensors | fp16 | 473 MB |
dutch-gpt2-f16.gguf | GGUF F16 | 249 MB |
dutch-gpt2-q8_0.gguf | GGUF Q8_0 | 132 MB |
Use with llama.cpp
bash
# Downloadwget https://huggingface.co/Thorstin/gpt2-dutch-instruct/resolve/main/dutch-gpt2-q8_0.gguf# Runllama-cli -m dutch-gpt2-q8_0.gguf \-p "### Instructie:\nWat is de hoofdstad van Nederland?\n### Antwoord:\n" \-n 200
Use with Ollama
bash
# Create Modelfilecat > Modelfile << 'EOF'FROM ./dutch-gpt2-q8_0.ggufTEMPLATE """### Instructie:{{ .Prompt }}### Antwoord:"""PARAMETER temperature 0.7PARAMETER top_p 0.9PARAMETER repeat_penalty 1.3PARAMETER num_ctx 512EOFollama create dutch-gpt2 -f Modelfileollama run dutch-gpt2
Training
Phase 1 — Pretraining from scratch
- Dataset: CC-100 Dutch (~37 GB raw, ~6.6B tokens), streamed
- Tokenizer: ByteLevel BPE trained on first 500K CC-100 Dutch documents
- Hardware: NVIDIA Tesla T4 (16 GB VRAM)
- Tokens trained: ~5B
- Steps: 154,000
- Final loss: 3.54
- Duration: ~70 GPU hours
- Key settings:
fp16=True,gradient_checkpointing=True,batch_size=32,lr=5e-4, cosine scheduler
Phase 2 — Instruction fine-tuning (SFT)
- Dataset:
BramVanroy/alpaca-cleaned-dutch— 46,163 Dutch instruction/response pairs - Framework: TRL 1.6.0 SFTTrainer
- Epochs: 3
- Steps: 4,329
- Loss: 3.31 → 1.14
- Duration: ~1.25 hours
Instruction format
markdown
### Instructie:<vraag of instructie>### Antwoord:<antwoord>
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "Thorstin/gpt2-dutch-instruct"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)model.eval()def chat(instruction: str, max_new_tokens: int = 200) -> str:prompt = f"### Instructie:\n{instruction}\n### Antwoord:\n"inputs = tokenizer(prompt, return_tensors="pt")with torch.no_grad():output = model.generate(**inputs,max_new_tokens=max_new_tokens,do_sample=True,temperature=0.7,top_p=0.9,repetition_penalty=1.3,pad_token_id=tokenizer.eos_token_id,)response = tokenizer.decode(output[0], skip_special_tokens=True)return response.split("### Antwoord:")[-1].strip()print(chat("Wat is de hoofdstad van Nederland?"))
Benchmark results (lm-evaluation-harness, limit=200)
| Task | Accuracy | Accuracy (norm) |
|---|---|---|
| hellaswag_nl | 24.50% | 28.50% |
| arc_nl | 19.00% | 29.00% |
| blimp_nl | 80.67% | 79.51% |
Random baseline: 50% for BLiMP-NL (binary), 25% for HellaSwag/ARC (4-way).
Sample outputs
| Prompt | Response |
|---|---|
| Wat is de hoofdstad van Nederland? | De hoofdstad van Nederland is Amsterdam.... |
| Leg uit wat fotosynthese is. | Fotosynthese is het proces waarbij planten lichtenergie van de zon omzetten in chemische energie die ze gebruiken om koo... |
| Schrijf een kort gedicht over de zee. | De golven slaan tegen het raam, Een kalmerende bries draagt de geur van zout en vers gezette koffie. Het geluid van gebr... |
Limitations
- 124M parameters is a hard ceiling — expect occasional repetition, factual errors, and shorter coherent responses compared to larger models
- Context window is limited to 512 tokens
Framework versions
| Package | Version |
|---|---|
| TRL | 1.6.0 |
| Transformers | 4.48 |
| PyTorch | 2.9.1+cu128 |
| Datasets | 2.16 |
| Tokenizers | 0.21 |
Model provider
Thorstin
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information