Model details
- Base model:
Qwen/Qwen3-8B-Base
- Construction: Training-free Layer Swap — layers L13–L22 of
Qwen3-8B-EN transplanted into Qwen3-8B-DE
- Language: German (CoT and answer)
- Context length: 32,768 tokens
- Dataset (underlying specialists):
lightonai/Dolci-Think-SFT-32B-Multilingual
[!NOTE]
The model was trained on data derived from allenai/Dolci-Think-SFT-32B, released under the ODC-BY-1.0 license.
This model is part of a German specialist trio designed to study the native reasoning gap:
Evaluation
All scores are mean accuracy (%) on the German version of each benchmark, with sample standard deviation across runs. AIME 24/25 is averaged over 30 runs; the others over 10 runs, using the recommended generation parameters.
Table with columns: Model, MGSM-Rev2, Global-MMLU-Lite, GPQA-Diamond, AIME 24/25, HumanEvalPlus, Average| Model | MGSM-Rev2 | Global-MMLU-Lite | GPQA-Diamond | AIME 24/25 | HumanEvalPlus | Average |
|---|
Qwen3-8B-DE | 93.12 | 75.15 | 55.20 | 54.56 | 84.94 | 72.59 |
Qwen3-8B-DE-Swap | 96.96 | 77.35 | 56.16 | 58.28 | 87.00 | 75.15 |
Qwen3-8B-DE-Pivot-EN | 93.76 | 78.05 | 57.68 | 62.06 | 86.81 | 75.67 |
Qwen3-8B-EN | 95.88 | 75.80 | 55.45 | 57.94 | 82.56 | 73.53 |
Benchmarks used:
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "lightonai/Qwen3-8B-DE-Swap"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [{"role": "user", "content": "Löse: 24 × 17 = ?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=32768, temperature=1.0, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Recommended sampling: temperature=1.0, top_p=0.95, top_k=20, min_p=0.
Citation
If you find our work helpful, feel free to give us a cite.
@misc{lasbordes2026rethinking,
title = {Rethinking the Multilingual Reasoning Gap with Layer Swap},
author = {Lasbordes, Maxence and Chatelain, Amélie and Seddah, Djamé},
year = {2026},
eprint = {2605.26735},
archivePrefix= {arXiv},
primaryClass = {cs.CL}
}