Model Description
Table with columns: Property, Value| Property | Value |
|---|
| Base Model | microsoft/Phi-3-mini-4k-instruct |
| Fine-tuning | QLoRA + Optuna HPO |
| LoRA Rank | r=64, alpha=128 |
| Training Examples | 250 |
| Optuna Trials | 20 (TPE sampler, A100) |
| Optimized For | Unseen word generalisation |
Table with columns: Words, Score| Words | Score |
|---|
| Seen words | 76.0% |
| Unseen words | 82.0% |
| Overall | 79.0% |
| Generalisation gap | -6.0% (unseen > seen) |
Key Finding
Optuna was configured to maximize unseen word score.
This produced a negative generalisation gap (-6%) where
the model performs better on words it never saw during training.
However overall score (79.0%) is lower than v2 (89.4%),
demonstrating metric-objective misalignment — optimizing
for a single metric (unseen) hurt the overall performance.
Lesson: HPO objective should be (seen + unseen) / 2
not just unseen score alone.
Best Config Found by Optuna
Table with columns: Parameter, Value| Parameter | Value |
|---|
| Learning rate | 2.36e-4 |
| Epochs | 32 |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Quantization | 4-bit |
Recommended Version
For production use, v2
achieves higher overall score (89.4% vs 79.0%).
v3 is useful as a research artifact demonstrating
generalisation vs accuracy trade-offs in HPO.
Links