Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Why v1.2

Issue (v1.1)v1.2 change
Paraphrase-supported → weak (1/10 holdout)Holdout-shaped Sanad rows, supported rationale, 45% supported mix
Train/eval shape mismatchMulti-scale excerpts (full / chunked / sentence)
Eval user-only vs train system+userEval with --chat-template
Weak over-call on hedged paraphraseTighter make_weak; weak capped at 12%

Training

FieldValue
Base modelgoogle/gemma-4-E4B-it
MethodQLoRA (Unsloth), Vast RTX A6000
Train rows850 (seed 44)
Seq length1536
LoRA r / α16 / 32
Epochs / LR3 / 1.5e-4
Code / dataNassilaT commit 5244403+

Verdict mix (train): supported 382 · weak 102 · not_in_source 171 · contradicted 127 · insufficient_evidence 68

Evaluation (Vast, llama-server + Q6_K, 50 rows, --chat-template)

MetricStock E4B baselinev1.2Target
JSON parse (strict)100%100%
Expect pass (combined)86%86%≥90%
Expect pass (holdout)84.4%91.1%
Quote validity (holdout)100%90.9%≥98%
False supported (holdout)11.8%0%≤5%

Holdout by category (v1.2)

CategoryPass rateNotes
supported (h-001–h-010)90% (9/10)h-010 miss; ≥8/10 stretch goal met
contradicted88.9% (8/9)h-013 miss; h-012/h-014 fixed vs baseline
not_in_source100%
weak100%h-032/h-034 fixed vs baseline
insufficient_evidence100%
multi_claim66.7% (4/6)h-043, h-045 miss

Holdout failures

  • h-010 — expected supported; verdict missing
  • h-013 — expected contradicted; verdict missing
  • h-043 — partial claim (costs) not flagged not_in_source / insufficient_evidence
  • h-045 — pediatric claim absent from excerpt not flagged

Core eval regression (5 rows)

Core expect pass 40% (2/5) vs stock baseline 100% — dragged combined score to 86% despite holdout gains.

GGUF was not published (combined expect <90%, quote validity <98%).

vs prior adapters

VersionSupported holdoutCombined expectQuote validity (holdout)
v1~0%~62%~0%
v1.110% (1/10)66%9.1%
v1.290% (9/10)86%90.9%

v1.2 fixes the v1.1 paraphrase-weak failure mode but does not clear shipping gates.

How to use (merge → GGUF)

This repo is a LoRA adapter only. Merge with base, then convert for LM Studio:

bash

# After merge (see NassilaT training/scripts/merge_adapter_gemma4.py)
python merge_adapter_gemma4.py \
--adapter-dir ./lora_adapter \
--out-dir ./hf-merged-v1.2-bf16
# llama.cpp → Q6_K GGUF, then llama-server / LM Studio

Model provider

QinEmPeRoR93

Model tree

Base

google/gemma-4-E4B-it

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today