tjarvis91/Q-Triage-1B-LoRA API & Inference Endpoint

Why this exists

Big general models are good at everything and great at nothing. They burn hundreds of watts to do work that fits in a 36 MB adapter.

This is one specialist from the Qovaryx compact-intelligence release. It does one job — support ticket triage — and it does it at 100.0% mean accuracy on a 60-row held-out evaluation, with a 95% bootstrap-CI lower bound of 100.0% against a strict gate of 90.0%.

That's the bar.

What it's good for

Help-desk ticket triage (incident sev1-3, billing, IT)
Customer support queue routing
On-device CRM intake classifier
Email-to-ticket categorization
Privacy-preserving ticket triage (no cloud)

Headline result

Table
Metric	Value
Task	support ticket triage
Mean accuracy (n=60 holdout)	100.0%
Bootstrap-CI lower bound (95% conf)	100.0%
Strict gate	90.0%
Status	PASS at strict CI

Quickstart

python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceTB/SmolLM2-1.7B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = PeftModel.from_pretrained(base, "tjarvis91/Q-Triage-1B-LoRA")
model.eval()

chat = [{"role": "user", "content": "Triage. Return JSON {category, priority}.\nSubject: Cannot login after deploy\nDesc: 502 errors since 14:00"}]
prompt = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=120, do_sample=False, pad_token_id=tok.pad_token_id)
print(tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Expected output:

markdown
{"category": "incident/sev2", "priority": "high"}

Compact intelligence is not small intelligence

This model has 18 million trainable parameters (LoRA rank 16 on a 1.7B base). It runs in bf16 on CPU in a few hundred milliseconds per call. It hits a 100.0% precision bar that most large general models miss because they're optimizing for breadth, not depth.

Intelligence per watt > parameter count.

Intelligence per watt

Table
Property	Value
Base model	SmolLM2-1.7B-Instruct
Adapter size	~36 MB
Trainable params	18,087,936
Inference	bf16 on CPU; 4-bit QLoRA-friendly
VRAM target	4 GB (Q4) / 8 GB (bf16)
Runs offline	yes

Local AI, no cloud

This adapter ships as part of a local-first AI thesis. No telemetry. No data leaves the machine. The base model is open. The adapter is signed and watermarked. The runtime is yours.

The story

Qovaryx is a research line on local-first AI for the constraint-aware operator. The original Qovaryx Options Decoder closed 15-of-15 internal benchmark cells at strict bootstrap-CI lower bound, then shipped as a public CPU runtime at Qovaryx/qovaryx-options-decoder-full-community.

This adapter applies the same compact-intelligence discipline to office work: single-task LoRA, strict-CI-gated, on-device. The training recipe stays in-house — the same posture we used for the Options Decoder. What's published is the artifact and the headline metric.

Limitations

One job, one specialist. Out-of-domain prompts will get out-of-domain answers.
This is a LoRA adapter, not a standalone model — you need HuggingFaceTB/SmolLM2-1.7B-Instruct as the base.
Holdout is n=60 — a strong CI but not a production cert. Validate on your own data.
Not financial, medical, legal, or employment advice. Human review for high-stakes use.

Watermark

Each released adapter carries a unique fingerprint in adapter_config.json (_qovaryx_watermark.fingerprint) for attribution and tamper-detection. This adapter's fingerprint: 89809280bb22957c85fde0455167ef144d2c3bc04a553947698cc91cbab613cf.

Community + support

Discord: https://discord.gg/PtuHZDv5ju — builders, install help, model questions
Ko-fi: https://ko-fi.com/tjarvis91 — every coffee literally buys GPU time for the next training cycle
Research devlog: https://github.com/thron-j/qovaryx-ai-research
Companion runtime (options decoder): https://huggingface.co/Qovaryx/qovaryx-options-decoder-full-community

Citation

If you use this in research or product work, cite:

bibtex
@misc{qovaryx_q_triage_2026,
  author = {Jarvis, Thomas},
  title  = {Q-Triage-1B-LoRA: Qovaryx Compact Intelligence specialist for support ticket triage},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/tjarvis91/Q-Triage-1B-LoRA},
}

License

Apache-2.0 for the adapter weights. The base model HuggingFaceTB/SmolLM2-1.7B-Instruct is Apache-2.0 from HuggingFaceTB.

The training corpus, the recipe, and the cluster-shell routing logic are not part of this release.

Q-Triage-1B-LoRA

Get help setting up a custom Dedicated Endpoints.

README