poolside-laguna-hackathon

protein-ligand-design

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

What this adapter is

Table

Type	PEFT LoRA adapter (not a merged model)
Base model	`poolside/Laguna-XS.2`
Rank `r`	16
`lora_alpha`	32
`lora_dropout`	0.0
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `experts`
Dtype	F32

LoRA is applied to the attention projections and the MoE expert MLPs, which is why the adapter is large (~4.6 GB) despite being rank-16.

Training

Trained on Prime Intellect Hosted Training — shared run dashboard:

Algorithm: GRPO
Reward: binary final-answer correctness (1.0 correct / 0.0 wrong) — using tools is the means, never the reward
Learning rate: 1e-5
Rollouts per example: 16
Batch size: 128
Max tokens: 4096, thinking enabled
Steps: 80 (full run); the final checkpoint is the (tied-)best on eval.

What training changed

On a held-out set of 60 questions, eval accuracy (avg@1) rose from the base model's 91.7% to 96.7% — a real +5-point gain, with the final checkpoint tied for best. Training was stable throughout: reward held around 0.9 (normal GRPO variance, dipping to ~0.65 and recovering), completions stayed ~900 tokens (no ballooning), and there were no truncated or failed rollouts.

held-out eval curve

A concrete example. One question the base model got wrong before training:

For a kinase backbone-carbonyl halogen bond, is 4-bromo-7-azaindole (Brc1ccc2[nH]ccc2n1) a candidate — it must carry a heavy halogen (Cl/Br/I) for the halogen bond, an aromatic ring for π-stacking, and LogP between 1 and 3? (correct answer: yes)

The base model ran the tools (substructure matches for the halogen and ring, plus descriptors) but committed the wrong verdict. This is the kind of tool-grounded judgement the adapter sharpens.

Honest framing: the base model is already strong on this set (~92%), so the available headroom was modest — the adapter captured most of it.

Choosing the training recipe

The stable hyperparameters above didn't come for free — they're the output of a sweep on a precursor environment, allan/science-gym-bio. The lesson: learning rate is the stability lever (5e-5 peaks then collapses; 1e-5 holds), and larger rollout groups (8 → 16) cut GRPO advantage variance. That recipe — LR 1e-5, 16 rollouts/example, thinking on — is what we carried into the protein-ligand run.

science-gym-bio hyperparameter sweep

Usage

python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "poolside/Laguna-XS.2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "poolside-laguna-hackathon/protein-ligand-design")

For the full tool-use evaluation loop, install and run the gym:

bash
prime env install jdthewlis/protein-ligand-design
prime eval run jdthewlis/protein-ligand-design -m <your-deployment> -n 20 -r 3

Built by Team JAMMY for the poolside Laguna hackathon. Trained with GRPO on Prime Intellect Hosted Training; environment questions generated with Claude Opus 4.8.

Model provider

poolside-laguna-hackathon

Model tree

Base

poolside/Laguna-XS.2

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

What this adapter is

Table

Type	PEFT LoRA adapter (not a merged model)
Base model	`poolside/Laguna-XS.2`
Rank `r`	16
`lora_alpha`	32
`lora_dropout`	0.0
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `experts`
Dtype	F32

LoRA is applied to the attention projections and the MoE expert MLPs, which is why the adapter is large (~4.6 GB) despite being rank-16.

Training

Trained on Prime Intellect Hosted Training — shared run dashboard:

Algorithm: GRPO
Reward: binary final-answer correctness (1.0 correct / 0.0 wrong) — using tools is the means, never the reward
Learning rate: 1e-5
Rollouts per example: 16
Batch size: 128
Max tokens: 4096, thinking enabled
Steps: 80 (full run); the final checkpoint is the (tied-)best on eval.

What training changed

held-out eval curve

A concrete example. One question the base model got wrong before training:

For a kinase backbone-carbonyl halogen bond, is 4-bromo-7-azaindole (Brc1ccc2[nH]ccc2n1) a candidate — it must carry a heavy halogen (Cl/Br/I) for the halogen bond, an aromatic ring for π-stacking, and LogP between 1 and 3? (correct answer: yes)

The base model ran the tools (substructure matches for the halogen and ring, plus descriptors) but committed the wrong verdict. This is the kind of tool-grounded judgement the adapter sharpens.

Honest framing: the base model is already strong on this set (~92%), so the available headroom was modest — the adapter captured most of it.

Choosing the training recipe

science-gym-bio hyperparameter sweep

Usage

python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "poolside/Laguna-XS.2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "poolside-laguna-hackathon/protein-ligand-design")

For the full tool-use evaluation loop, install and run the gym:

bash
prime env install jdthewlis/protein-ligand-design
prime eval run jdthewlis/protein-ligand-design -m <your-deployment> -n 20 -r 3

Built by Team JAMMY for the poolside Laguna hackathon. Trained with GRPO on Prime Intellect Hosted Training; environment questions generated with Claude Opus 4.8.

protein-ligand-design

Get help setting up a custom Dedicated Endpoints.

README

What this adapter is

Training

What training changed

Choosing the training recipe

Usage

Explore FriendliAI today

README

What this adapter is

Training

What training changed

Choosing the training recipe

Usage