Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Why a generative detector

Most hallucination-detection systems are encoder token-classifiers (e.g. LettuceDetect). A generative LoRA is a different paradigm:

  • Preserves the full answer structure with explicit per-span typing
  • Extensible to chain-of-thought rationales after the closing tag
  • Slightly stronger on the hardest type, contradiction, where world-model understanding matters more than per-token lexical features

Training

  • 2 epochs on combined/train (1,955 records)
  • Batch 4 × grad_accum 2 (effective 8); lr 5e-5; bf16; warmup 6%; max_len 1536
  • attn_implementation="eager" (Qwen2 + SDPA + bf16 has known NaN issues)
  • LoRA params kept in fp32 (PEFT + bf16 + fused AdamW → NaN grads)
  • Single H200, 7 min training + 13 min inference on all 4 configs
epochval loss
10.0216
20.0179

Test-set results (sentence-level F1 — the leaderboard metric)

ConfigLexical floorLettuceDetect-large (zero-shot)LookBackLens (in-domain)ModernBERT-ftThis model+ Ensemble
combined0.3020.3610.4890.7980.7710.871
contradiction0.2310.3150.3770.7630.8000.877
missing_tool0.2180.3300.4060.9660.9270.993
overgeneration0.3190.3350.5080.6970.6720.824

Best single-model on contradiction — beats the encoder fine-tune by +4 pp. Validates the hypothesis that LLM world-modelling beats per-token lexical features for value-swap hallucinations.

Companion ModernBERT model

For per-token encoder-based detection see ArsenyIvanov/toolace-halu-modernbert-large. A stacking LightGBM ensemble over the two reaches sentence F1 0.871 on combined. See the project repo and notebooks/improve_baselines.ipynb for full code, training curves and analytics.

Usage

python

import re
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "ArsenyIvanov/toolace-halu-qwen-lora"
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
tokenizer.padding_side = "left"
base = AutoModelForCausalLM.from_pretrained(
BASE, torch_dtype=torch.bfloat16, attn_implementation="eager"
).to("cuda").eval()
model = PeftModel.from_pretrained(base, ADAPTER).to("cuda").eval()
SYSTEM = (
"You are a hallucination detector for tool-augmented dialogues. "
"Given the tool context, the available tools, the user query and the assistant answer, "
"rewrite the assistant answer wrapping every hallucinated span in "
'<halu type="contradiction">...</halu>, <halu type="missing_tool">...</halu> '
'or <halu type="overgeneration">...</halu> tags. '
"Do not alter any other characters. If the answer contains no hallucinations, return it unchanged."
)
def detect(query, tool_context, tool_names, answer):
user = (
f"[Tool context]\n{tool_context}\n\n"
f"[Available tools]\n{', '.join(tool_names)}\n\n"
f"[User query]\n{query}\n\n"
f"[Assistant answer]\n{answer}\n\n"
"Now rewrite the assistant answer above with <halu> markers around hallucinated spans."
)
msgs = [{"role": "system", "content": SYSTEM},
{"role": "user", "content": user}]
prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
enc = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1536).to("cuda")
with torch.no_grad():
gen = model.generate(**enc, max_new_tokens=512, do_sample=False,
pad_token_id=tokenizer.pad_token_id)
completion = tokenizer.decode(gen[0, enc["input_ids"].shape[1]:], skip_special_tokens=True)
spans, cursor = [], 0
HALU_RE = re.compile(r'<halu type="(contradiction|missing_tool|overgeneration)">(.+?)</halu>', re.DOTALL)
for m in HALU_RE.finditer(completion):
ttype, inner = m.group(1), m.group(2)
idx = answer.find(inner, cursor)
if idx == -1: idx = answer.find(inner)
if idx == -1: continue
spans.append({"start": idx, "end": idx + len(inner), "text": inner, "label": ttype})
cursor = idx + len(inner)
return {"marked": completion, "spans": spans}

Hallucination types

LabelWhat it captures
contradictiongrounded value replaced by a plausible-but-wrong alternative
missing_tooloffers an action that requires a tool not in the available list
overgenerationinserted sentence with claims not supported by the tool output

Limitations

  • Synthetic corruptions only — no naturally occurring cascading errors
  • ~20× slower than ModernBERT-ft at inference (~5 sec/record vs ~50 ms)
  • LoRA adapter only — needs Qwen/Qwen2.5-7B-Instruct base model at runtime
  • Single corruption type per record (RAGTruth strict schema)

License

Apache 2.0 — matches the base model and the training dataset license.

Model provider

ArsenyIvanov

Model tree

Base

Qwen/Qwen2.5-7B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today