lsteno

Qwen3-4B-Instruct-2507-RLM-RLVR-depth2-recursive-r64-a128-lr1e-5-adapter

README

License: apache-2.0

LoRA adapter from the 150-step depth-2 recursive RLM RLVR run.

Base model: Qwen/Qwen3-4B-Instruct-2507
Run: rlm-rlvr-qwen3-4b-depth2-recursive-r64-a128-lr1e-5-s150-bal35f40v1
Adapter checkpoint: step_150
LoRA rank / alpha: 64 / 128
Learning rate: 1e-5
Training shape: root RLM can call child RLMs; child RLMs use the LLM-only cap prompt.
Dataset: lsteno/BEEG-agents, balanced train/eval split used by the run.
Uploaded: 2026-06-13T17:49:07.103876+00:00

This repository contains the PEFT adapter files only. Use it with the base model above.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Container

Run this model inference with full control and performance in your environment.

Model Details

Model Provider

lsteno

Model Tree