Introducing: Veritas-0.6B-Fact-Checker-Non-Thinking-1.0

Veritas-0.6B-Fact-Checker-Non-Thinking-1.0 is built on the Qwen3 architecture, starting from (Qwen/Qwen3-0.6B). Resect Research Labs has specialized, finetuned, and optimized this model for fact-checking and factual consistency verification.

Model Performance

The performance of this model is evaluated on LLM-AggreFact (unseen by this model during training), the benchmark is an aggregation of 11 human annotated datasets on fact-checking and grounding.

Overall Performance

Veritas-0.6B-Fact-Checker-Non-Thinking-1.0 achieves an average score of 72.30%, an improvement of 7.37% above Qwen3-0.6B in non-thinking mode.

Benchmark Details (LLM-AggreFact)

Balanced Accuracy Scores

Model	Size	Avg	CNN	XSum	MediaS	MeetB	WiCE	REVEAL	Claim Verify	Fact Check	Expert QA	LFQA	RAG Truth
Qwen3-0.6B (non-thinking)	0.6B	64.93	57.93	66.71	57.13	62.60	67.23	86.99	59.85	72.63	56.44	70.18	56.56
Veritas-0.6B-Fact-Checker-Non-Thinking-1.0	0.6B	72.30	65.84	66.75	68.45	73.47	73.43	83.32	71.89	73.45	58.66	82.57	77.49

The benchmarks noted here for Veritas-0.6B-Fact-Checker-Non-Thinking-1.0 were performed on the test set and a PR has been submitted to Minicheck's Library (Pull Request) to support additional operating modes including this model. Note: Performance may vary slightly depending on hardware configuration and vLLM version

Model Usage

Scope of Use

Veritas-0.6B-Fact-Checker-Non-Thinking model must only be used strictly for the prescribed scoring mode, which generates a binary classification based on the specified template. Any deviation from this intended use may lead to unexpected outputs.

Using Minicheck's library ¹

Requires the changes from our Pull Request to be merged, see ¹

Please run the following command to install the MiniCheck package and all necessary dependencies.

sh
pip install "minicheck[llm] @ git+https://github.com/Liyan06/MiniCheck.git@main"

Below is a simple use case

python
from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

doc = "A group of students gather in the school library to study for their upcoming final exams."
claim_1 = "The students are preparing for an examination."
claim_2 = "The students are on vacation."

chat_kwargs = {'enable_thinking': False}

scorer = MiniCheck(model_name='resect-ai/veritas-0.6B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_tokens=1, cache_dir='./ckpts', bypass_model_check=True)
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # can set `chunk_size=your-specified-value` here, default to 32K chunk size. 

print(pred_label) # [1, 0]
print(raw_prob)   # [0.9796443054985795, 0.008577403129593576]

Test on LLM-AggreFact Benchmark ¹

python
import pandas as pd
from datasets import load_dataset
from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# load 30K test data
df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
docs = df.doc.values
claims = df.claim.values

chat_kwargs = {'enable_thinking': False}

scorer = MiniCheck(model_name='resect-ai/veritas-0.6B-fact-checker-non-thinking-1.0', enable_prefix_caching=False, extra_chat_template_kwargs=chat_kwargs, operating_mode="bespoke", max_tokens=1, cache_dir='./ckpts', bypass_model_check=True)
pred_label, raw_prob, _, _ = scorer.score(docs=docs, claims=claims)

To evaluate the result on the benchmark

python
from sklearn.metrics import balanced_accuracy_score

df['preds'] = pred_label
result_df = pd.DataFrame(columns=['Dataset', 'BAcc'])
for dataset in df.dataset.unique():
    sub_df = df[df.dataset == dataset]
    bacc = balanced_accuracy_score(sub_df.label, sub_df.preds) * 100
    result_df.loc[len(result_df)] = [dataset, bacc]

result_df.loc[len(result_df)] = ['Average', result_df.BAcc.mean()]
result_df.round(1)

License

This model Veritas-0.6B-Fact-Checker-Non-Thinking-1.0 is bound by the Apache 2.0 license found at https://choosealicense.com/licenses/apache-2.0. By downloading and using this model you agree to the license terms.

Acknowledgements

Model perfected by Resect Research Labs.

Footnotes

Pull Request to Minicheck's library submitted awaiting review ↩ ↩² ↩³

veritas-0.6B-fact-checker-non-thinking-1.0

Get help setting up a custom Dedicated Endpoints.

README

Introducing: Veritas-0.6B-Fact-Checker-Non-Thinking-1.0

Model Performance

Overall Performance

Benchmark Details (LLM-AggreFact)

Model Usage

Scope of Use

Using Minicheck's library ¹

Below is a simple use case

Test on LLM-AggreFact Benchmark ¹

License

Acknowledgements

Footnotes

Explore FriendliAI today

veritas-0.6B-fact-checker-non-thinking-1.0

veritas-0.6B-fact-checker-non-thinking-1.0

Get help setting up a custom Dedicated Endpoints.

Introducing: Veritas-0.6B-Fact-Checker-Non-Thinking-1.0

Model Performance

Overall Performance

Benchmark Details (LLM-AggreFact)

Model Usage

Scope of Use

Using Minicheck's library 1

Below is a simple use case

Test on LLM-AggreFact Benchmark 1

License

Acknowledgements

Footnotes

Explore FriendliAI today

veritas-0.6B-fact-checker-non-thinking-1.0

Using Minicheck's library ¹

Test on LLM-AggreFact Benchmark ¹