Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Features and capabilities
- Claim extraction (explicit + implicit), compound-sentence decomposition, brief paraphrased claims.
- Typing (
fact/statistic/opinion/prediction/speculation/rhetoric/other), stance (asserted/denied/hedged/attributed/ironic), sentiment (positive/negative/neutral/mixed). - Verbatim evidence anchoring; contradiction & statistical-consistency detection (
contradiction/tension). - Sarcasm/irony handling (restates real meaning,
ironicstance +tensionlink). - 3–6 investigative verification questions per claim; metadata (summary, date-if-present, keywords).
Repository layout
markdown
README.md this fileconfig.json / *.safetensors merged fp16 model (HF format, at repo root)generation_config.jsontokenizer.json / tokenizer_config.json / vocab.json / merges.txt / special_tokens_map.jsonQwen2.5-3B-Instruct.Q4_K_M.gguf quantized model for llama.cpp (4 GB GPU / CPU)claim.gbnf grammar that guarantees valid JSONprompt.txt system prompt / task instructionschema.json output schema + label mappings (enums)requirements.txt dependenciesLICENSElora_adapter/ LoRA adapter onlyscripts/ inference.py inference_hf.py evaluate.pybenchmarks/ benchmarks.json benchmark_comparison.md base/teacher/finetuned scorescorpus/ labeled.jsonl converted.jsonl DATASET_MANIFEST.json CORPUS.mdtraining/ train_config.json RUN_SUMMARY.json
Installation
bash
pip install -r requirements.txt# GGUF path needs only: pip install llama-cpp-python (add a CUDA wheel index for GPU)
Quick start (grammar-constrained → always-valid JSON)
bash
python -c "from huggingface_hub import snapshot_download; snapshot_download('Luimas/claim-extractor-detective-qwen3b', local_dir='claimx')"cd claimxpython scripts/inference.py --text "The mayor said crime fell; hours later the chief said it rose."
Usage examples
llama.cpp (Python):
python
import json, globfrom llama_cpp import Llama, LlamaGrammarllm = Llama(model_path=glob.glob("*.gguf")[0], n_ctx=4096, n_gpu_layers=-1, verbose=False)prompt = open("prompt.txt").read(); grammar = LlamaGrammar.from_string(open("claim.gbnf").read())out = llm.create_chat_completion(messages=[{"role":"user","content":prompt+"YOUR TEXT"}],grammar=grammar, temperature=0.0, max_tokens=768)print(json.loads(out["choices"][0]["message"]["content"]))
Transformers (merged fp16): python scripts/inference_hf.py --text "..." (loads this repo directly).
Input and output formats
- Input: one block of English text (news, social post, review, press release, sarcastic/adversarial prose);
prepend
prompt.txt. Truncated to ~4000 chars. - Output: exactly one JSON object (no prose), schema below.
Output schema
json
{"summary": "<1-3 sentence neutral summary>","publication_date": "<ISO date if present, else null>","keywords": ["<3-12 terms>"],"claims": [{"id": 0, "claim": "<brief paraphrase>","claim_type": "fact|statistic|opinion|prediction|speculation|rhetoric|other","category": "<topic>", "importance": "high|medium|low","stance": "asserted|denied|hedged|attributed|ironic","sentiment": "positive|negative|neutral|mixed","evidence_span": "<verbatim substring>", "confidence": 0.0,"verification_questions": ["<3-6 investigative questions>"]}],"contradictions": [{"claim_a": 0, "claim_b": 1, "relation": "contradiction|tension", "explanation": "<why>"}]}
Full enum/label mappings are in schema.json. Guarantees: always-valid JSON; keywords/claims
non-empty; ids 0..n-1; no duplicate claims; evidence_span verbatim; ≥3 verification questions/claim;
contradictions reference real ids.
Fine-tuning details
Knowledge distillation + QLoRA (4-bit base, fp16 adapters) with Unsloth on Kaggle (2× T4). The
Qwen/Qwen2.5-14B-Instruct teacher labels passages into the schema; the unsloth/Qwen2.5-3B-Instruct-bnb-4bit student learns to reproduce it.
Best checkpoint kept by eval-loss; data balanced per source with hand-authored gold examples upweighted.
Full hyper-parameters in training/train_config.json; run details in training/RUN_SUMMARY.json.
Training dataset
Bundled under corpus/ (self-contained): labeled.jsonl (teacher-labeled + hand-authored gold
examples) + converted.jsonl (SNLI/MNLI/ANLI/FEVER/LIAR templated). See corpus/CORPUS.md and
corpus/DATASET_MANIFEST.json. Trained on ~1471 examples (val ~127).
Benchmarks and evaluation
Base vs teacher vs fine-tuned on a fixed diverse test set (benchmarks/benchmarks.json,
benchmark_comparison.md). Fine-tuned highlights:
| Metric | Base | Fine-tuned |
|---|---|---|
| JSON validity | 1.0 | 1.0 |
| Verification-questions / claim | — | 3 |
| Contradiction recall | — | 0.75 |
| Sarcasm handling | — | 1.0 |
| Evidence-verbatim rate | — | 1.0 |
| Avg claim length (words) | — | 7.806 |
Held-out validity: 1.0. Re-run locally: python scripts/evaluate.py.
Deployment (RTX 3050 4 GB or CPU, offline)
The three files needed are Qwen2.5-3B-Instruct.Q4_K_M.gguf + claim.gbnf + prompt.txt.
bash
pip install llama-cpp-python # CUDA: --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124python scripts/inference.py --text "Paste any English paragraph."
Grammar-constrained decoding guarantees valid JSON on every call.
Limitations
English only. No truth/veracity verdicts (surfaces what to verify, not whether it is true). It is a structured extractor, not a chat assistant. Evidence spans are verbatim from the input; if the input is wrong, the extracted claim reflects that. Distilled from a 14B teacher — quality is bounded by it.
Citation
bibtex
@misc{claim_extractor_qwen3b,title = {Claim Extractor: a local, grammar-constrained claim-extraction model (Qwen2.5-3B, QLoRA)},author = {Luimas},year = {2026},note = {Hugging Face: Luimas/claim-extractor-detective-qwen3b}}
License
Apache-2.0 (see LICENSE). Inherits the license terms of the base model unsloth/Qwen2.5-3B-Instruct-bnb-4bit.
Model provider
Luimas
Model tree
Base
unsloth/Qwen2.5-3B-Instruct-bnb-4bit
Quantized
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information