Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Why this exists

Big general models are good at everything and great at nothing. They burn hundreds of watts to do work that fits in a 36 MB adapter.

This is one specialist from the Qovaryx compact-intelligence release. It does one job — support ticket triage — and it does it at 100.0% mean accuracy on a 60-row held-out evaluation, with a 95% bootstrap-CI lower bound of 100.0% against a strict gate of 90.0%.

That's the bar.

What it's good for

  • Help-desk ticket triage (incident sev1-3, billing, IT)
  • Customer support queue routing
  • On-device CRM intake classifier
  • Email-to-ticket categorization
  • Privacy-preserving ticket triage (no cloud)

Headline result

MetricValue
Tasksupport ticket triage
Mean accuracy (n=60 holdout)100.0%
Bootstrap-CI lower bound (95% conf)100.0%
Strict gate90.0%
StatusPASS at strict CI

Quickstart

python

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM2-1.7B-Instruct",
torch_dtype="bfloat16",
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = PeftModel.from_pretrained(base, "tjarvis91/Q-Triage-1B-LoRA")
model.eval()
chat = [{"role": "user", "content": "Triage. Return JSON {category, priority}.\nSubject: Cannot login after deploy\nDesc: 502 errors since 14:00"}]
prompt = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=120, do_sample=False, pad_token_id=tok.pad_token_id)
print(tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Expected output:

markdown

{"category": "incident/sev2", "priority": "high"}

Compact intelligence is not small intelligence

This model has 18 million trainable parameters (LoRA rank 16 on a 1.7B base). It runs in bf16 on CPU in a few hundred milliseconds per call. It hits a 100.0% precision bar that most large general models miss because they're optimizing for breadth, not depth.

Intelligence per watt > parameter count.

Intelligence per watt

PropertyValue
Base modelSmolLM2-1.7B-Instruct
Adapter size~36 MB
Trainable params18,087,936
Inferencebf16 on CPU; 4-bit QLoRA-friendly
VRAM target4 GB (Q4) / 8 GB (bf16)
Runs offlineyes

Local AI, no cloud

This adapter ships as part of a local-first AI thesis. No telemetry. No data leaves the machine. The base model is open. The adapter is signed and watermarked. The runtime is yours.

The story

Qovaryx is a research line on local-first AI for the constraint-aware operator. The original Qovaryx Options Decoder closed 15-of-15 internal benchmark cells at strict bootstrap-CI lower bound, then shipped as a public CPU runtime at Qovaryx/qovaryx-options-decoder-full-community.

This adapter applies the same compact-intelligence discipline to office work: single-task LoRA, strict-CI-gated, on-device. The training recipe stays in-house — the same posture we used for the Options Decoder. What's published is the artifact and the headline metric.

Limitations

  • One job, one specialist. Out-of-domain prompts will get out-of-domain answers.
  • This is a LoRA adapter, not a standalone model — you need HuggingFaceTB/SmolLM2-1.7B-Instruct as the base.
  • Holdout is n=60 — a strong CI but not a production cert. Validate on your own data.
  • Not financial, medical, legal, or employment advice. Human review for high-stakes use.

Watermark

Each released adapter carries a unique fingerprint in adapter_config.json (_qovaryx_watermark.fingerprint) for attribution and tamper-detection. This adapter's fingerprint: 89809280bb22957c85fde0455167ef144d2c3bc04a553947698cc91cbab613cf.

Community + support

Citation

If you use this in research or product work, cite:

bibtex

@misc{qovaryx_q_triage_2026,
author = {Jarvis, Thomas},
title = {Q-Triage-1B-LoRA: Qovaryx Compact Intelligence specialist for support ticket triage},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/tjarvis91/Q-Triage-1B-LoRA},
}

License

Apache-2.0 for the adapter weights. The base model HuggingFaceTB/SmolLM2-1.7B-Instruct is Apache-2.0 from HuggingFaceTB.

The training corpus, the recipe, and the cluster-shell routing logic are not part of this release.

Model provider

tjarvis91

Model tree

Base

HuggingFaceTB/SmolLM2-1.7B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today