Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why v1.3
v1.2 fixed supported holdout (9/10) but failed combined go/no-go (86%) due to core eval collapse (2/5) and quote validity (90.9%). v1.3 added ~207 multi-claim rows, polarity (-pol-), semantic Sanad (-sanadsem-), and overclaim (-over-) labels; 2 epochs @ 1e-4.
Training
| Field | Value |
|---|---|
| Task | l3_grounding (abstract-only excerpts) |
| Worker | Sanad (l3_grounding) |
| Base model | google/gemma-4-E4B-it |
| Method | QLoRA (Unsloth), Vast RTX A6000 48 GB |
| Train rows | 850 (seed 45) |
| Seq length | 1536 |
| LoRA r / α | 16 / 32 |
| Epochs | 2 |
| Learning rate | 1e-4 |
| Eval | --chat-template (matches train) |
| Export | Merge via merge_adapter_gemma4.py → llama.cpp b9608 → Q6_K |
| Code | NassilaT — training/PHASE2_5_V1_3_PLAN.md |
Evaluation (Vast, llama-server + Q6_K, 50 rows)
| Metric | Stock baseline | v1.2 | v1.3 | Target |
|---|---|---|---|---|
| Combined expect pass | 86% | 86% | 80% | ≥90% |
| Core eval (5 rows) | 100% | 40% | 100% | — |
| Holdout expect pass | 84.4% | 91.1% | 77.8% | — |
| JSON parse (combined, repair) | 100% | 100% | 86% | ≥95% |
| Quote validity (holdout) | 100% | 90.9% | 36.4% | ≥98% |
| False supported (holdout) | 11.8% | 0% | 2.9% | ≤5% |
| Supported h-001–h-010 | 10/10 | 9/10 | 3/10 | ≥8/10 |
Holdout by category (v1.3)
| Category | Pass rate |
|---|---|
| supported (h-001–h-010) | 30% (3/10) |
| contradicted | 100% (9/9) |
| weak | 100% |
| insufficient_evidence | 100% |
| not_in_source | 89% |
| multi_claim | 67% (4/6) |
What improved vs v1.2
- Core eval 5/5 — eval-001 (supported), eval-003 (contradicted/overclaim), eval-005 (multi-claim) all pass.
- Contradicted holdout 100% — including h-013 (polarity).
What regressed vs v1.2
- Supported holdout 3/10 — seven rows (h-002, h-004–h-008, h-010) fail with
must_parse_json(Expecting ',' delimiterafter repair), not verdict errors. - Combined expect 80% (down from 86%).
- Quote validity 36.4% (down from 90.9%) — largely driven by parse failures on supported rows.
Other holdout failures
| Row | Failure |
|---|---|
| h-028 | not_in_source verdict missing |
| h-043 | forbidden supported verdict |
| h-045 | missing not_in_source / insufficient_evidence |
Usage (research / re-export only)
LoRA weights only. Merge with base:
bash
python scripts/merge_adapter_gemma4.py \--adapter-dir ./lora_adapter \--out-dir ./hf-merged-v1.3-bf16
Model provider
QinEmPeRoR93
Model tree
Base
google/gemma-4-E4B-it
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information