onda/ligature-seam-gemma4 API & Inference Endpoint

Canonical result

The canonical paper result is exp5 checkpoint-500, not this exp10 run.

Use exp5 checkpoint-500 for the paper's direct seam-classifier result:

outputs/hillclimb_exp5/lora/checkpoint-500

That checkpoint has the best direct human-gold heldout result found so far:

exp5 checkpoint-500 macro F1 on selnoligGT2.txt: 0.945921653410345
exp5 checkpoint-500 accuracy on selnoligGT2.txt: 0.9527439024390244

This exp10 repository is kept as an experiment artifact and negative result: lower LR plus longer training did not improve the direct classifier. The exp5 checkpoint is not uploaded under this exp10 repository prefix.

Which adapter to load

Use the explicit checkpoint subdirectories:

hillclimb_exp10/adapters/checkpoint-500
hillclimb_exp10/adapters/checkpoint-1000
hillclimb_exp10/adapters/checkpoint-1500
hillclimb_exp10/adapters/checkpoint-2000
hillclimb_exp10/adapters/checkpoint-2500
hillclimb_exp10/adapters/checkpoint-3000

The root-level adapter files, if present in repository history/snapshots, are legacy suppress-only rank16 artifacts from an earlier upload and should not be treated as the exp10 result. The exp10 run is represented by the hillclimb_exp10/adapters/... paths above.

Training run

Config: hillclimb_exp10.yaml in this repository snapshot / upload.
Base model: google/gemma-4-12B-it.
Training hardware: 2x A100 80GB with torch DDP.
Train output: outputs/hillclimb_exp10/lora/checkpoint-*.
Checkpoints uploaded here: checkpoint-500, checkpoint-1000, checkpoint-1500, checkpoint-2000, checkpoint-2500, checkpoint-3000.

Heldout classifier result for this exp10 run

Direct evaluation on selnoligGT2.txt showed that exp10 did not improve over the previous exp5 checkpoint-500 classifier.

Previous exp5 checkpoint-500 gold macro F1: 0.945921653.
exp10 checkpoint-500 gold macro F1: 0.925104828.
exp10 checkpoint-1000 gold macro F1: 0.906452763.
exp10 checkpoint-1500 gold macro F1: 0.892553049.

Interpretation: lower learning rate plus longer training degraded direct human-gold seam-classifier performance. This suggests the useful signal in this dataset is reached early, and additional optimization starts fitting idiosyncrasies of the train distribution rather than improving the heldout psycholinguistic boundary decision.

Downstream patgen note

For en-wiki patgen experiments generated from exp10 checkpoints, checkpoint-2500 produced the best pattern-level heldout macro F1 among exp10 checkpoints:

{ "500": { "macro_f1": 0.8458406600483104, "accuracy": 0.8649468892261002 }, "1000": { "macro_f1": 0.8381156147232458, "accuracy": 0.858877086494689 }, "1500": { "macro_f1": 0.8457685826624228, "accuracy": 0.8634294385432474 }, "2000": { "macro_f1": 0.834056023974697, "accuracy": 0.8619119878603946 }, "2500": { "macro_f1": 0.860016339869281, "accuracy": 0.881638846737481 } }

This downstream result is not the same as direct classifier quality. As of this upload, exp5 checkpoint-500 has not yet been run through the same en-wiki/3M patgen pipeline, so paper checkpoint selection should keep classifier and pattern-generation evidence separate.

Uploaded contents

Each checkpoint directory is adapter-only: PEFT adapter weights/config, tokenizer/chat template files, task config, and README. Trainer-only files such as optimizer.pt, scheduler.pt, RNG state, and trainer state are intentionally excluded.

ligature-seam-gemma4

Get help setting up a custom Dedicated Endpoints.

README