Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What this adapter is

TypePEFT LoRA adapter (not a merged model)
Base modelpoolside/Laguna-XS.2
Rank r16
lora_alpha32
lora_dropout0.0
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, experts
DtypeF32

LoRA is applied to the attention projections and the MoE expert MLPs, which is why the adapter is large (~4.6 GB) despite being rank-16.

Training

Trained on Prime Intellect Hosted Training — shared run dashboard:

  • Algorithm: GRPO
  • Reward: binary final-answer correctness (1.0 correct / 0.0 wrong) — using tools is the means, never the reward
  • Learning rate: 1e-5
  • Rollouts per example: 16
  • Batch size: 128
  • Max tokens: 4096, thinking enabled
  • Steps: 80 (full run); the final checkpoint is the (tied-)best on eval.

What training changed

On a held-out set of 60 questions, eval accuracy (avg@1) rose from the base model's 91.7% to 96.7% — a real +5-point gain, with the final checkpoint tied for best. Training was stable throughout: reward held around 0.9 (normal GRPO variance, dipping to ~0.65 and recovering), completions stayed ~900 tokens (no ballooning), and there were no truncated or failed rollouts.

held-out eval curve

A concrete example. One question the base model got wrong before training:

For a kinase backbone-carbonyl halogen bond, is 4-bromo-7-azaindole (Brc1ccc2[nH]ccc2n1) a candidate — it must carry a heavy halogen (Cl/Br/I) for the halogen bond, an aromatic ring for π-stacking, and LogP between 1 and 3? (correct answer: yes)

The base model ran the tools (substructure matches for the halogen and ring, plus descriptors) but committed the wrong verdict. This is the kind of tool-grounded judgement the adapter sharpens.

Honest framing: the base model is already strong on this set (~92%), so the available headroom was modest — the adapter captured most of it.

Choosing the training recipe

The stable hyperparameters above didn't come for free — they're the output of a sweep on a precursor environment, allan/science-gym-bio. The lesson: learning rate is the stability lever (5e-5 peaks then collapses; 1e-5 holds), and larger rollout groups (8 → 16) cut GRPO advantage variance. That recipe — LR 1e-5, 16 rollouts/example, thinking on — is what we carried into the protein-ligand run.

science-gym-bio hyperparameter sweep

Usage

python

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "poolside/Laguna-XS.2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "poolside-laguna-hackathon/protein-ligand-design")

For the full tool-use evaluation loop, install and run the gym:

bash

prime env install jdthewlis/protein-ligand-design
prime eval run jdthewlis/protein-ligand-design -m <your-deployment> -n 20 -r 3

Built by Team JAMMY for the poolside Laguna hackathon. Trained with GRPO on Prime Intellect Hosted Training; environment questions generated with Claude Opus 4.8.

Model provider

poolside-laguna-hackathon

Model tree

Base

poolside/Laguna-XS.2

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today