patnir41
kaetram-qwen3.5-2b-opd-r3
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Method
On-policy distillation with a reverse-KL advantage against a scaffolded 4B
teacher, trained with PPO-clipped importance sampling (LoRA r=64, α=64, no rsLoRA,
bf16, 1 epoch, advantage clamp ±3, early-turn step-weight 1.5). Round 3 fits a
fresh LoRA on the merged r2 checkpoint and adds counterfactual-canonicalized grading
of malformed emissions. Full construction:
patnir41/kaetram-opd-2b.
Chain: base Qwen3.5-2B → r1 → (merge) → r2 → (merge) → r3.
Files
- root: merged bf16 weights (
Qwen3_5ForConditionalGeneration) — load directly. adapter/: the LoRA adapter alone (applies on top of the merged r2 checkpoint).
Text-only fine-tune of a multimodal-capable base; chat_template.jinja preserves
<think> on every assistant turn.
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizerm = AutoModelForCausalLM.from_pretrained("patnir41/kaetram-qwen3.5-2b-opd-r3", torch_dtype="bfloat16", device_map="auto")t = AutoTokenizer.from_pretrained("patnir41/kaetram-qwen3.5-2b-opd-r3")
Limitations
Single-task agent for the Kaetram Core-3 benchmark, not a general assistant. Residual malformed tool-call syntax (recovered at the harness level); "Rick's Roll" unsolved. Tokenizer-locked to the Qwen3.5-2B vocabulary.
License & credits
Apache-2.0, inheriting Qwen3.5-2B
(© 2026 Alibaba Cloud). Game environment/data from
Kaetram-Open (MPL-2.0). See NOTICE.
All training data was generated by Qwen self-play — no third-party proprietary
model outputs were used.
Citation
bibtex
@misc{kaetram_opd_2b_r3_2026,title = {Kaetram Qwen3.5-2B OPD (Round 3)},author = {patnir41},year = {2026},howpublished = {\url{https://huggingface.co/patnir41/kaetram-qwen3.5-2b-opd-r3}}}
Model provider
patnir41
Model tree
Base
Qwen/Qwen3.5-2B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information