Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Features and capabilities

  • Claim extraction (explicit + implicit), compound-sentence decomposition, brief paraphrased claims.
  • Typing (fact/statistic/opinion/prediction/speculation/rhetoric/other), stance (asserted/denied/hedged/attributed/ironic), sentiment (positive/negative/neutral/mixed).
  • Verbatim evidence anchoring; contradiction & statistical-consistency detection (contradiction/tension).
  • Sarcasm/irony handling (restates real meaning, ironic stance + tension link).
  • 3–6 investigative verification questions per claim; metadata (summary, date-if-present, keywords).

Repository layout

markdown

README.md this file
config.json / *.safetensors merged fp16 model (HF format, at repo root)
generation_config.json
tokenizer.json / tokenizer_config.json / vocab.json / merges.txt / special_tokens_map.json
Qwen2.5-3B-Instruct.Q4_K_M.gguf quantized model for llama.cpp (4 GB GPU / CPU)
claim.gbnf grammar that guarantees valid JSON
prompt.txt system prompt / task instruction
schema.json output schema + label mappings (enums)
requirements.txt dependencies
LICENSE
lora_adapter/ LoRA adapter only
scripts/ inference.py inference_hf.py evaluate.py
benchmarks/ benchmarks.json benchmark_comparison.md base/teacher/finetuned scores
corpus/ labeled.jsonl converted.jsonl DATASET_MANIFEST.json CORPUS.md
training/ train_config.json RUN_SUMMARY.json

Installation

bash

pip install -r requirements.txt
# GGUF path needs only: pip install llama-cpp-python (add a CUDA wheel index for GPU)

Quick start (grammar-constrained → always-valid JSON)

bash

python -c "from huggingface_hub import snapshot_download; snapshot_download('Luimas/claim-extractor-detective-qwen3b', local_dir='claimx')"
cd claimx
python scripts/inference.py --text "The mayor said crime fell; hours later the chief said it rose."

Usage examples

llama.cpp (Python):

python

import json, glob
from llama_cpp import Llama, LlamaGrammar
llm = Llama(model_path=glob.glob("*.gguf")[0], n_ctx=4096, n_gpu_layers=-1, verbose=False)
prompt = open("prompt.txt").read(); grammar = LlamaGrammar.from_string(open("claim.gbnf").read())
out = llm.create_chat_completion(messages=[{"role":"user","content":prompt+"YOUR TEXT"}],
grammar=grammar, temperature=0.0, max_tokens=768)
print(json.loads(out["choices"][0]["message"]["content"]))

Transformers (merged fp16): python scripts/inference_hf.py --text "..." (loads this repo directly).

Input and output formats

  • Input: one block of English text (news, social post, review, press release, sarcastic/adversarial prose); prepend prompt.txt. Truncated to ~4000 chars.
  • Output: exactly one JSON object (no prose), schema below.

Output schema

json

{
"summary": "<1-3 sentence neutral summary>",
"publication_date": "<ISO date if present, else null>",
"keywords": ["<3-12 terms>"],
"claims": [{
"id": 0, "claim": "<brief paraphrase>",
"claim_type": "fact|statistic|opinion|prediction|speculation|rhetoric|other",
"category": "<topic>", "importance": "high|medium|low",
"stance": "asserted|denied|hedged|attributed|ironic",
"sentiment": "positive|negative|neutral|mixed",
"evidence_span": "<verbatim substring>", "confidence": 0.0,
"verification_questions": ["<3-6 investigative questions>"]
}],
"contradictions": [{"claim_a": 0, "claim_b": 1, "relation": "contradiction|tension", "explanation": "<why>"}]
}

Full enum/label mappings are in schema.json. Guarantees: always-valid JSON; keywords/claims non-empty; ids 0..n-1; no duplicate claims; evidence_span verbatim; ≥3 verification questions/claim; contradictions reference real ids.

Fine-tuning details

Knowledge distillation + QLoRA (4-bit base, fp16 adapters) with Unsloth on Kaggle (2× T4). The Qwen/Qwen2.5-14B-Instruct teacher labels passages into the schema; the unsloth/Qwen2.5-3B-Instruct-bnb-4bit student learns to reproduce it. Best checkpoint kept by eval-loss; data balanced per source with hand-authored gold examples upweighted. Full hyper-parameters in training/train_config.json; run details in training/RUN_SUMMARY.json.

Training dataset

Bundled under corpus/ (self-contained): labeled.jsonl (teacher-labeled + hand-authored gold examples) + converted.jsonl (SNLI/MNLI/ANLI/FEVER/LIAR templated). See corpus/CORPUS.md and corpus/DATASET_MANIFEST.json. Trained on ~1471 examples (val ~127).

Benchmarks and evaluation

Base vs teacher vs fine-tuned on a fixed diverse test set (benchmarks/benchmarks.json, benchmark_comparison.md). Fine-tuned highlights:

MetricBaseFine-tuned
JSON validity1.01.0
Verification-questions / claim3
Contradiction recall0.75
Sarcasm handling1.0
Evidence-verbatim rate1.0
Avg claim length (words)7.806

Held-out validity: 1.0. Re-run locally: python scripts/evaluate.py.

Deployment (RTX 3050 4 GB or CPU, offline)

The three files needed are Qwen2.5-3B-Instruct.Q4_K_M.gguf + claim.gbnf + prompt.txt.

bash

pip install llama-cpp-python # CUDA: --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
python scripts/inference.py --text "Paste any English paragraph."

Grammar-constrained decoding guarantees valid JSON on every call.

Limitations

English only. No truth/veracity verdicts (surfaces what to verify, not whether it is true). It is a structured extractor, not a chat assistant. Evidence spans are verbatim from the input; if the input is wrong, the extracted claim reflects that. Distilled from a 14B teacher — quality is bounded by it.

Citation

bibtex

@misc{claim_extractor_qwen3b,
title = {Claim Extractor: a local, grammar-constrained claim-extraction model (Qwen2.5-3B, QLoRA)},
author = {Luimas},
year = {2026},
note = {Hugging Face: Luimas/claim-extractor-detective-qwen3b}
}

License

Apache-2.0 (see LICENSE). Inherits the license terms of the base model unsloth/Qwen2.5-3B-Instruct-bnb-4bit.

Model provider

Luimas

Model tree

Base

unsloth/Qwen2.5-3B-Instruct-bnb-4bit

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today