Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why v1.2
| Issue (v1.1) | v1.2 change |
|---|---|
Paraphrase-supported → weak (1/10 holdout) | Holdout-shaped Sanad rows, supported rationale, 45% supported mix |
| Train/eval shape mismatch | Multi-scale excerpts (full / chunked / sentence) |
| Eval user-only vs train system+user | Eval with --chat-template |
| Weak over-call on hedged paraphrase | Tighter make_weak; weak capped at 12% |
Training
| Field | Value |
|---|---|
| Base model | google/gemma-4-E4B-it |
| Method | QLoRA (Unsloth), Vast RTX A6000 |
| Train rows | 850 (seed 44) |
| Seq length | 1536 |
| LoRA r / α | 16 / 32 |
| Epochs / LR | 3 / 1.5e-4 |
| Code / data | NassilaT commit 5244403+ |
Verdict mix (train): supported 382 · weak 102 · not_in_source 171 · contradicted 127 · insufficient_evidence 68
Evaluation (Vast, llama-server + Q6_K, 50 rows, --chat-template)
| Metric | Stock E4B baseline | v1.2 | Target |
|---|---|---|---|
| JSON parse (strict) | 100% | 100% | — |
| Expect pass (combined) | 86% | 86% | ≥90% |
| Expect pass (holdout) | 84.4% | 91.1% | — |
| Quote validity (holdout) | 100% | 90.9% | ≥98% |
| False supported (holdout) | 11.8% | 0% | ≤5% |
Holdout by category (v1.2)
| Category | Pass rate | Notes |
|---|---|---|
| supported (h-001–h-010) | 90% (9/10) | h-010 miss; ≥8/10 stretch goal met |
| contradicted | 88.9% (8/9) | h-013 miss; h-012/h-014 fixed vs baseline |
| not_in_source | 100% | |
| weak | 100% | h-032/h-034 fixed vs baseline |
| insufficient_evidence | 100% | |
| multi_claim | 66.7% (4/6) | h-043, h-045 miss |
Holdout failures
- h-010 — expected
supported; verdict missing - h-013 — expected
contradicted; verdict missing - h-043 — partial claim (costs) not flagged
not_in_source/insufficient_evidence - h-045 — pediatric claim absent from excerpt not flagged
Core eval regression (5 rows)
Core expect pass 40% (2/5) vs stock baseline 100% — dragged combined score to 86% despite holdout gains.
GGUF was not published (combined expect <90%, quote validity <98%).
vs prior adapters
| Version | Supported holdout | Combined expect | Quote validity (holdout) |
|---|---|---|---|
| v1 | ~0% | ~62% | ~0% |
| v1.1 | 10% (1/10) | 66% | 9.1% |
| v1.2 | 90% (9/10) | 86% | 90.9% |
v1.2 fixes the v1.1 paraphrase-weak failure mode but does not clear shipping gates.
How to use (merge → GGUF)
This repo is a LoRA adapter only. Merge with base, then convert for LM Studio:
bash
# After merge (see NassilaT training/scripts/merge_adapter_gemma4.py)python merge_adapter_gemma4.py \--adapter-dir ./lora_adapter \--out-dir ./hf-merged-v1.2-bf16# llama.cpp → Q6_K GGUF, then llama-server / LM Studio
Model provider
QinEmPeRoR93
Model tree
Base
google/gemma-4-E4B-it
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information