kaetram-qwen3.5-2b-opd-r3 API & Inference Endpoint

Method

On-policy distillation with a reverse-KL advantage against a scaffolded 4B teacher, trained with PPO-clipped importance sampling (LoRA r=64, α=64, no rsLoRA, bf16, 1 epoch, advantage clamp ±3, early-turn step-weight 1.5). Round 3 fits a fresh LoRA on the merged r2 checkpoint and adds counterfactual-canonicalized grading of malformed emissions. Full construction: patnir41/kaetram-opd-2b.

Chain: base Qwen3.5-2B → r1 → (merge) → r2 → (merge) → r3.

Files

root: merged bf16 weights (Qwen3_5ForConditionalGeneration) — load directly.
adapter/: the LoRA adapter alone (applies on top of the merged r2 checkpoint).

Text-only fine-tune of a multimodal-capable base; chat_template.jinja preserves <think> on every assistant turn.

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("patnir41/kaetram-qwen3.5-2b-opd-r3", torch_dtype="bfloat16", device_map="auto")
t = AutoTokenizer.from_pretrained("patnir41/kaetram-qwen3.5-2b-opd-r3")

Limitations

Single-task agent for the Kaetram Core-3 benchmark, not a general assistant. Residual malformed tool-call syntax (recovered at the harness level); "Rick's Roll" unsolved. Tokenizer-locked to the Qwen3.5-2B vocabulary.

License & credits

Apache-2.0, inheriting Qwen3.5-2B (© 2026 Alibaba Cloud). Game environment/data from Kaetram-Open (MPL-2.0). See NOTICE. All training data was generated by Qwen self-play — no third-party proprietary model outputs were used.

Citation

bibtex
@misc{kaetram_opd_2b_r3_2026,
  title        = {Kaetram Qwen3.5-2B OPD (Round 3)},
  author       = {patnir41},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/patnir41/kaetram-qwen3.5-2b-opd-r3}}
}

Method

Chain: base Qwen3.5-2B → r1 → (merge) → r2 → (merge) → r3.

Files

root: merged bf16 weights (Qwen3_5ForConditionalGeneration) — load directly.
adapter/: the LoRA adapter alone (applies on top of the merged r2 checkpoint).

Text-only fine-tune of a multimodal-capable base; chat_template.jinja preserves <think> on every assistant turn.

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("patnir41/kaetram-qwen3.5-2b-opd-r3", torch_dtype="bfloat16", device_map="auto")
t = AutoTokenizer.from_pretrained("patnir41/kaetram-qwen3.5-2b-opd-r3")

Limitations

License & credits

Citation

bibtex
@misc{kaetram_opd_2b_r3_2026,
  title        = {Kaetram Qwen3.5-2B OPD (Round 3)},
  author       = {patnir41},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/patnir41/kaetram-qwen3.5-2b-opd-r3}}
}

kaetram-qwen3.5-2b-opd-r3

README

Method

Files

Usage

Limitations

License & credits

Citation

Explore FriendliAI today

README

Method

Files

Usage

Limitations

License & credits

Citation