Recommended Checkpoint
Use this first for paper-facing reproduction:
paper_checkpoint/gpu06_large_ta_s070_p050_reg000_run006/lora_final/
Files:
pytorch_lora_weights.safetensors: FLUX LoRA weights
text_adapter.pt: residual text-condition adapter
Held-out 600-prompt evaluation summary:
Table with columns: Method, CLIP rarity percentile, CLIP minority score, DINOv2 rarity percentile, DINOv2 minority score, CLIP prompt alignment| Method | CLIP rarity percentile | CLIP minority score | DINOv2 rarity percentile | DINOv2 minority score | CLIP prompt alignment |
|---|
| Vanilla | 0.500 | 0.500 | 0.500 | 0.500 | 0.329 |
| MinorityPrompt | 0.594 | 0.572 | 0.588 | 0.578 | 0.311 |
| HIPSTER text-adapter run006 | 0.635 | 0.615 | 0.569 | 0.565 | 0.317 |
The checkpoint is useful because it improves rarity/minority metrics over vanilla and MinorityPrompt in the held-out FLUX evaluation while keeping prompt alignment in a comparable range. It should still be visually audited for each paper figure.
Other Included Artifacts
visual_demo/
gpu05_auto_flux_universal_creativity_rank24_run000
gpu00_auto_flux_universal_creativity_rank24_run005
These are useful for qualitative figures because they move farther away from vanilla generations. They were trained on a smaller early prompt setup, so they are not the primary checkpoint for paper claims.
ablations/novelty_sde/
gpu02_k16_stoch_mid_token4_run002
gpu07_k16_refaway_strong_token4_run007
These support ablations for same-prompt stochastic candidate generation, mid-time noise injection, reference-away reward, and preference tokens.
ablations/multivalue/
gpu07_mv_softmin_token4_run007
This is the best preserved prototype for the multi-value HIPSTER/GDPO code path. It is useful as method evidence, but it is not the headline checkpoint.
Evaluation Files
eval_results/heldout_eval_run006_20260516_111727/: held-out 600-prompt comparison against vanilla, C3, CreativePrompt, and MinorityPrompt.
eval_results/visual_demo_flux_short_general_eval_run001/: vanilla-distance summaries for short general prompts.
eval_results/novelty_sde_50percat/: ablation evaluation summary.
eval_results/multivalue_12percat/: multi-value prototype evaluation summary.
Reproducibility Notes
Important training scripts are mirrored in reproducibility/, and the full code is on GitHub. The final LoRA folders are enough for inference, while summary.json, config.json, and evaluation CSVs are included to recover the training/evaluation setup.
The primary checkpoint uses a residual text-condition adapter. Loading only pytorch_lora_weights.safetensors is not equivalent to the reported checkpoint; text_adapter.pt must also be loaded by the HIPSTER code path.
Caveats
These artifacts are research checkpoints, not polished release models. Some variants optimize novelty strongly and can produce artifacts or prompt drift. For paper claims, prefer the held-out 600-prompt quantitative results plus curated qualitative grids from the primary checkpoint.