Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitModel Description
| Property | Value |
|---|---|
| Base Model | microsoft/Phi-3-mini-4k-instruct |
| Fine-tuning | QLoRA + Optuna HPO |
| LoRA Rank | r=64, alpha=128 |
| Training Examples | 250 |
| Optuna Trials | 20 (TPE sampler, A100) |
| Optimized For | Unseen word generalisation |
Performance
| Words | Score |
|---|---|
| Seen words | 76.0% |
| Unseen words | 82.0% |
| Overall | 79.0% |
| Generalisation gap | -6.0% (unseen > seen) |
Key Finding
Optuna was configured to maximize unseen word score. This produced a negative generalisation gap (-6%) where the model performs better on words it never saw during training.
However overall score (79.0%) is lower than v2 (89.4%), demonstrating metric-objective misalignment — optimizing for a single metric (unseen) hurt the overall performance.
Lesson: HPO objective should be (seen + unseen) / 2
not just unseen score alone.
Best Config Found by Optuna
| Parameter | Value |
|---|---|
| Learning rate | 2.36e-4 |
| Epochs | 32 |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Quantization | 4-bit |
Recommended Version
For production use, v2 achieves higher overall score (89.4% vs 79.0%). v3 is useful as a research artifact demonstrating generalisation vs accuracy trade-offs in HPO.
Links
Model provider
ninadp
Model tree
Base
microsoft/Phi-3-mini-4k-instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information