ArsenyIvanov

toolace-halu-qwen-lora

README

License: apache-2.0

Why a generative detector

Most hallucination-detection systems are encoder token-classifiers (e.g. LettuceDetect). A generative LoRA is a different paradigm:

Preserves the full answer structure with explicit per-span typing
Extensible to chain-of-thought rationales after the closing tag
Slightly stronger on the hardest type, contradiction, where world-model understanding matters more than per-token lexical features

Training

2 epochs on combined/train (1,955 records)
Batch 4 × grad_accum 2 (effective 8); lr 5e-5; bf16; warmup 6%; max_len 1536
attn_implementation="eager" (Qwen2 + SDPA + bf16 has known NaN issues)
LoRA params kept in fp32 (PEFT + bf16 + fused AdamW → NaN grads)
Single H200, 7 min training + 13 min inference on all 4 configs

Table with columns: epoch, val loss
epoch	val loss
1	0.0216
2	0.0179

Test-set results (sentence-level F1 — the leaderboard metric)

Table with columns: Config, Lexical floor, LettuceDetect-large (zero-shot), LookBackLens (in-domain), ModernBERT-ft, This model, + Ensemble
Config	Lexical floor	LettuceDetect-large (zero-shot)	LookBackLens (in-domain)	ModernBERT-ft	This model	+ Ensemble
combined	0.302	0.361	0.489	0.798	0.771	0.871
contradiction	0.231	0.315	0.377	0.763

⭐ Best single-model on contradiction — beats the encoder fine-tune by +4 pp. Validates the hypothesis that LLM world-modelling beats per-token lexical features for value-swap hallucinations.

Companion ModernBERT model

For per-token encoder-based detection see ArsenyIvanov/toolace-halu-modernbert-large. A stacking LightGBM ensemble over the two reaches sentence F1 0.871 on combined. See the project repo and notebooks/improve_baselines.ipynb for full code, training curves and analytics.

Usage

python
import re
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "ArsenyIvanov/toolace-halu-qwen-lora"

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
tokenizer.padding_side = "left"
base = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, attn_implementation="eager"
).to("cuda").eval()
model = PeftModel.from_pretrained(base, ADAPTER).to("cuda").eval()

SYSTEM = (
    "You are a hallucination detector for tool-augmented dialogues. "
    "Given the tool context, the available tools, the user query and the assistant answer, "
    "rewrite the assistant answer wrapping every hallucinated span in "
    '<halu type="contradiction">...</halu>, <halu type="missing_tool">...</halu> '
    'or <halu type="overgeneration">...</halu> tags. '
    "Do not alter any other characters. If the answer contains no hallucinations, return it unchanged."
)

def detect(query, tool_context, tool_names, answer):
    user = (
        f"[Tool context]\n{tool_context}\n\n"
        f"[Available tools]\n{', '.join(tool_names)}\n\n"
        f"[User query]\n{query}\n\n"
        f"[Assistant answer]\n{answer}\n\n"
        "Now rewrite the assistant answer above with <halu> markers around hallucinated spans."
    )
    msgs = [{"role": "system", "content": SYSTEM},
            {"role": "user",   "content": user}]
    prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
    enc = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1536).to("cuda")
    with torch.no_grad():
        gen = model.generate(**enc, max_new_tokens=512, do_sample=False,
                             pad_token_id=tokenizer.pad_token_id)
    completion = tokenizer.decode(gen[0, enc["input_ids"].shape[1]:], skip_special_tokens=True)

    spans, cursor = [], 0
    HALU_RE = re.compile(r'<halu type="(contradiction|missing_tool|overgeneration)">(.+?)</halu>', re.DOTALL)
    for m in HALU_RE.finditer(completion):
        ttype, inner = m.group(1), m.group(2)
        idx = answer.find(inner, cursor)
        if idx == -1: idx = answer.find(inner)
        if idx == -1: continue
        spans.append({"start": idx, "end": idx + len(inner), "text": inner, "label": ttype})
        cursor = idx + len(inner)
    return {"marked": completion, "spans": spans}

Hallucination types

Table with columns: Label, What it captures
Label	What it captures
`contradiction`	grounded value replaced by a plausible-but-wrong alternative
`missing_tool`	offers an action that requires a tool not in the available list
`overgeneration`	inserted sentence with claims not supported by the tool output

Limitations

Synthetic corruptions only — no naturally occurring cascading errors
~20× slower than ModernBERT-ft at inference (~5 sec/record vs ~50 ms)
LoRA adapter only — needs Qwen/Qwen2.5-7B-Instruct base model at runtime
Single corruption type per record (RAGTruth strict schema)

License

Apache 2.0 — matches the base model and the training dataset license.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

ArsenyIvanov

Model Tree

Base

Qwen/Qwen2.5-7B-Instruct

Adapter

this model

Input Modalities

Text

Output Modalities