QinEmPeRoR93/nassila-grounding-e4b-v1.2-adapter API & Inference Endpoint

Why v1.2

Issue (v1.1)	v1.2 change
Paraphrase-supported → `weak` (1/10 holdout)	Holdout-shaped Sanad rows, supported rationale, 45% supported mix
Train/eval shape mismatch	Multi-scale excerpts (`full` / `chunked` / `sentence`)
Eval user-only vs train system+user	Eval with `--chat-template`
Weak over-call on hedged paraphrase	Tighter `make_weak`; weak capped at 12%

Training

Field	Value
Base model	`google/gemma-4-E4B-it`
Method	QLoRA (Unsloth), Vast RTX A6000
Train rows	850 (seed 44)
Seq length	1536
LoRA r / α	16 / 32
Epochs / LR	3 / 1.5e-4
Code / data	NassilaT commit `5244403`+

Verdict mix (train): supported 382 · weak 102 · not_in_source 171 · contradicted 127 · insufficient_evidence 68

Evaluation (Vast, llama-server + Q6_K, 50 rows, `--chat-template`)

Metric	Stock E4B baseline	v1.2	Target
JSON parse (strict)	100%	100%	—
Expect pass (combined)	86%	86%	≥90%
Expect pass (holdout)	84.4%	91.1%	—
Quote validity (holdout)	100%	90.9%	≥98%
False supported (holdout)	11.8%	0%	≤5%

Holdout by category (v1.2)

Category	Pass rate	Notes
supported (h-001–h-010)	90% (9/10)	h-010 miss; ≥8/10 stretch goal met
contradicted	88.9% (8/9)	h-013 miss; h-012/h-014 fixed vs baseline
not_in_source	100%
weak	100%	h-032/h-034 fixed vs baseline
insufficient_evidence	100%
multi_claim	66.7% (4/6)	h-043, h-045 miss

Holdout failures

h-010 — expected supported; verdict missing
h-013 — expected contradicted; verdict missing
h-043 — partial claim (costs) not flagged not_in_source / insufficient_evidence
h-045 — pediatric claim absent from excerpt not flagged

Core eval regression (5 rows)

Core expect pass 40% (2/5) vs stock baseline 100% — dragged combined score to 86% despite holdout gains.

GGUF was not published (combined expect <90%, quote validity <98%).

vs prior adapters

Version	Supported holdout	Combined expect	Quote validity (holdout)
v1	~0%	~62%	~0%
v1.1	10% (1/10)	66%	9.1%
v1.2	90% (9/10)	86%	90.9%

v1.2 fixes the v1.1 paraphrase-weak failure mode but does not clear shipping gates.

How to use (merge → GGUF)

This repo is a LoRA adapter only. Merge with base, then convert for LM Studio:

bash
# After merge (see NassilaT training/scripts/merge_adapter_gemma4.py)
python merge_adapter_gemma4.py \
  --adapter-dir ./lora_adapter \
  --out-dir ./hf-merged-v1.2-bf16

# llama.cpp → Q6_K GGUF, then llama-server / LM Studio

nassila-grounding-e4b-v1.2-adapter

Get help setting up a custom Dedicated Endpoints.

README

Why v1.2

Training

Evaluation (Vast, llama-server + Q6_K, 50 rows, `--chat-template`)

Holdout by category (v1.2)

Holdout failures

Core eval regression (5 rows)

vs prior adapters

How to use (merge → GGUF)

Explore FriendliAI today

nassila-grounding-e4b-v1.2-adapter

nassila-grounding-e4b-v1.2-adapter

Get help setting up a custom Dedicated Endpoints.

Why v1.2

Training

Evaluation (Vast, llama-server + Q6_K, 50 rows, --chat-template)

Holdout by category (v1.2)

Holdout failures

Core eval regression (5 rows)

vs prior adapters

How to use (merge → GGUF)

Explore FriendliAI today

nassila-grounding-e4b-v1.2-adapter

Evaluation (Vast, llama-server + Q6_K, 50 rows, `--chat-template`)