onda
ligature-seam-gemma4
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Canonical result
The canonical paper result is exp5 checkpoint-500, not this exp10 run.
Use exp5 checkpoint-500 for the paper's direct seam-classifier result:
outputs/hillclimb_exp5/lora/checkpoint-500
That checkpoint has the best direct human-gold heldout result found so far:
- exp5 checkpoint-500 macro F1 on
selnoligGT2.txt:0.945921653410345 - exp5 checkpoint-500 accuracy on
selnoligGT2.txt:0.9527439024390244
This exp10 repository is kept as an experiment artifact and negative result: lower LR plus longer training did not improve the direct classifier. The exp5 checkpoint is not uploaded under this exp10 repository prefix.
Which adapter to load
Use the explicit checkpoint subdirectories:
hillclimb_exp10/adapters/checkpoint-500hillclimb_exp10/adapters/checkpoint-1000hillclimb_exp10/adapters/checkpoint-1500hillclimb_exp10/adapters/checkpoint-2000hillclimb_exp10/adapters/checkpoint-2500hillclimb_exp10/adapters/checkpoint-3000
The root-level adapter files, if present in repository history/snapshots, are
legacy suppress-only rank16 artifacts from an earlier upload and should not be
treated as the exp10 result. The exp10 run is represented by the
hillclimb_exp10/adapters/... paths above.
Training run
- Config:
hillclimb_exp10.yamlin this repository snapshot / upload. - Base model:
google/gemma-4-12B-it. - Training hardware: 2x A100 80GB with torch DDP.
- Train output:
outputs/hillclimb_exp10/lora/checkpoint-*. - Checkpoints uploaded here: checkpoint-500, checkpoint-1000, checkpoint-1500, checkpoint-2000, checkpoint-2500, checkpoint-3000.
Heldout classifier result for this exp10 run
Direct evaluation on selnoligGT2.txt showed that exp10 did not improve over the previous exp5 checkpoint-500 classifier.
- Previous exp5 checkpoint-500 gold macro F1:
0.945921653. - exp10 checkpoint-500 gold macro F1:
0.925104828. - exp10 checkpoint-1000 gold macro F1:
0.906452763. - exp10 checkpoint-1500 gold macro F1:
0.892553049.
Interpretation: lower learning rate plus longer training degraded direct human-gold seam-classifier performance. This suggests the useful signal in this dataset is reached early, and additional optimization starts fitting idiosyncrasies of the train distribution rather than improving the heldout psycholinguistic boundary decision.
Downstream patgen note
For en-wiki patgen experiments generated from exp10 checkpoints, checkpoint-2500 produced the best pattern-level heldout macro F1 among exp10 checkpoints:
{ "500": { "macro_f1": 0.8458406600483104, "accuracy": 0.8649468892261002 }, "1000": { "macro_f1": 0.8381156147232458, "accuracy": 0.858877086494689 }, "1500": { "macro_f1": 0.8457685826624228, "accuracy": 0.8634294385432474 }, "2000": { "macro_f1": 0.834056023974697, "accuracy": 0.8619119878603946 }, "2500": { "macro_f1": 0.860016339869281, "accuracy": 0.881638846737481 } }
This downstream result is not the same as direct classifier quality. As of this upload, exp5 checkpoint-500 has not yet been run through the same en-wiki/3M patgen pipeline, so paper checkpoint selection should keep classifier and pattern-generation evidence separate.
Uploaded contents
Each checkpoint directory is adapter-only: PEFT adapter weights/config, tokenizer/chat template files, task config, and README. Trainer-only files such as optimizer.pt, scheduler.pt, RNG state, and trainer state are intentionally excluded.
Model provider
onda
Model tree
Base
google/gemma-4-12B-it
Adapter
this model
Modalities
Input
Video, Audio, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information