Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why a generative detector
Most hallucination-detection systems are encoder token-classifiers (e.g. LettuceDetect). A generative LoRA is a different paradigm:
- Preserves the full answer structure with explicit per-span typing
- Extensible to chain-of-thought rationales after the closing tag
- Slightly stronger on the hardest type,
contradiction, where world-model understanding matters more than per-token lexical features
Training
- 2 epochs on
combined/train(1,955 records) - Batch 4 × grad_accum 2 (effective 8); lr 5e-5; bf16; warmup 6%; max_len 1536
attn_implementation="eager"(Qwen2 + SDPA + bf16 has known NaN issues)- LoRA params kept in fp32 (PEFT + bf16 + fused AdamW → NaN grads)
- Single H200, 7 min training + 13 min inference on all 4 configs
| epoch | val loss |
|---|---|
| 1 | 0.0216 |
| 2 | 0.0179 |
Test-set results (sentence-level F1 — the leaderboard metric)
| Config | Lexical floor | LettuceDetect-large (zero-shot) | LookBackLens (in-domain) | ModernBERT-ft | This model | + Ensemble |
|---|---|---|---|---|---|---|
| combined | 0.302 | 0.361 | 0.489 | 0.798 | 0.771 | 0.871 |
| contradiction | 0.231 | 0.315 | 0.377 | 0.763 | 0.800 ⭐ | 0.877 |
| missing_tool | 0.218 | 0.330 | 0.406 | 0.966 | 0.927 | 0.993 |
| overgeneration | 0.319 | 0.335 | 0.508 | 0.697 | 0.672 | 0.824 |
⭐ Best single-model on contradiction — beats the encoder fine-tune
by +4 pp. Validates the hypothesis that LLM world-modelling beats
per-token lexical features for value-swap hallucinations.
Companion ModernBERT model
For per-token encoder-based detection see
ArsenyIvanov/toolace-halu-modernbert-large.
A stacking LightGBM ensemble over the two reaches sentence F1 0.871 on
combined. See the project repo and
notebooks/improve_baselines.ipynb for full code, training curves and analytics.
Usage
python
import reimport torchfrom transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelBASE = "Qwen/Qwen2.5-7B-Instruct"ADAPTER = "ArsenyIvanov/toolace-halu-qwen-lora"tokenizer = AutoTokenizer.from_pretrained(ADAPTER)tokenizer.padding_side = "left"base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, attn_implementation="eager").to("cuda").eval()model = PeftModel.from_pretrained(base, ADAPTER).to("cuda").eval()SYSTEM = ("You are a hallucination detector for tool-augmented dialogues. ""Given the tool context, the available tools, the user query and the assistant answer, ""rewrite the assistant answer wrapping every hallucinated span in "'<halu type="contradiction">...</halu>, <halu type="missing_tool">...</halu> ''or <halu type="overgeneration">...</halu> tags. '"Do not alter any other characters. If the answer contains no hallucinations, return it unchanged.")def detect(query, tool_context, tool_names, answer):user = (f"[Tool context]\n{tool_context}\n\n"f"[Available tools]\n{', '.join(tool_names)}\n\n"f"[User query]\n{query}\n\n"f"[Assistant answer]\n{answer}\n\n""Now rewrite the assistant answer above with <halu> markers around hallucinated spans.")msgs = [{"role": "system", "content": SYSTEM},{"role": "user", "content": user}]prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)enc = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1536).to("cuda")with torch.no_grad():gen = model.generate(**enc, max_new_tokens=512, do_sample=False,pad_token_id=tokenizer.pad_token_id)completion = tokenizer.decode(gen[0, enc["input_ids"].shape[1]:], skip_special_tokens=True)spans, cursor = [], 0HALU_RE = re.compile(r'<halu type="(contradiction|missing_tool|overgeneration)">(.+?)</halu>', re.DOTALL)for m in HALU_RE.finditer(completion):ttype, inner = m.group(1), m.group(2)idx = answer.find(inner, cursor)if idx == -1: idx = answer.find(inner)if idx == -1: continuespans.append({"start": idx, "end": idx + len(inner), "text": inner, "label": ttype})cursor = idx + len(inner)return {"marked": completion, "spans": spans}
Hallucination types
| Label | What it captures |
|---|---|
contradiction | grounded value replaced by a plausible-but-wrong alternative |
missing_tool | offers an action that requires a tool not in the available list |
overgeneration | inserted sentence with claims not supported by the tool output |
Limitations
- Synthetic corruptions only — no naturally occurring cascading errors
- ~20× slower than ModernBERT-ft at inference (~5 sec/record vs ~50 ms)
- LoRA adapter only — needs
Qwen/Qwen2.5-7B-Instructbase model at runtime - Single corruption type per record (RAGTruth strict schema)
License
Apache 2.0 — matches the base model and the training dataset license.
Model provider
ArsenyIvanov
Model tree
Base
Qwen/Qwen2.5-7B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information