Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What this adapter is
| Type | PEFT LoRA adapter (not a merged model) |
| Base model | poolside/Laguna-XS.2 |
Rank r | 16 |
lora_alpha | 32 |
lora_dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, experts |
| Dtype | F32 |
LoRA is applied to the attention projections and the MoE expert MLPs, which is why the adapter is large (~4.6 GB) despite being rank-16.
Training
Trained on Prime Intellect Hosted Training — shared run dashboard:
- Algorithm: GRPO
- Reward: binary final-answer correctness (1.0 correct / 0.0 wrong) — using tools is the means, never the reward
- Learning rate: 1e-5
- Rollouts per example: 16
- Batch size: 128
- Max tokens: 4096, thinking enabled
- Steps: 80 (full run); the final checkpoint is the (tied-)best on eval.
What training changed
On a held-out set of 60 questions, eval accuracy (avg@1) rose from the base
model's 91.7% to 96.7% — a real +5-point gain, with the final
checkpoint tied for best. Training was stable throughout: reward held around 0.9
(normal GRPO variance, dipping to ~0.65 and recovering), completions stayed ~900
tokens (no ballooning), and there were no truncated or failed rollouts.

A concrete example. One question the base model got wrong before training:
For a kinase backbone-carbonyl halogen bond, is 4-bromo-7-azaindole (
Brc1ccc2[nH]ccc2n1) a candidate — it must carry a heavy halogen (Cl/Br/I) for the halogen bond, an aromatic ring for π-stacking, and LogP between 1 and 3? (correct answer: yes)
The base model ran the tools (substructure matches for the halogen and ring, plus descriptors) but committed the wrong verdict. This is the kind of tool-grounded judgement the adapter sharpens.
Honest framing: the base model is already strong on this set (~92%), so the available headroom was modest — the adapter captured most of it.
Choosing the training recipe
The stable hyperparameters above didn't come for free — they're the output of a
sweep on a precursor environment, allan/science-gym-bio. The lesson: learning
rate is the stability lever (5e-5 peaks then collapses; 1e-5 holds), and larger
rollout groups (8 → 16) cut GRPO advantage variance. That recipe — LR 1e-5,
16 rollouts/example, thinking on — is what we carried into the protein-ligand run.

Usage
python
import torchfrom peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase = "poolside/Laguna-XS.2"tok = AutoTokenizer.from_pretrained(base)model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")model = PeftModel.from_pretrained(model, "poolside-laguna-hackathon/protein-ligand-design")
For the full tool-use evaluation loop, install and run the gym:
bash
prime env install jdthewlis/protein-ligand-designprime eval run jdthewlis/protein-ligand-design -m <your-deployment> -n 20 -r 3
Built by Team JAMMY for the poolside Laguna hackathon. Trained with GRPO on Prime Intellect Hosted Training; environment questions generated with Claude Opus 4.8.
Model provider
poolside-laguna-hackathon
Model tree
Base
poolside/Laguna-XS.2
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information