Why this exists
Big general models are good at everything and great at nothing. They burn
hundreds of watts to do work that fits in a 36 MB adapter.
This is one specialist from the Qovaryx compact-intelligence release. It does
one job — office tool calling — and it does it at 100.0% mean accuracy on
a 60-row held-out evaluation, with a 95% bootstrap-CI lower bound of 100.0%
against a strict gate of 95.0%.
That's the bar.
What it's good for
- Calendar event creation with attendees + duration + datetime
- Email send tool calls with to/subject/attachment
- Reminder scheduling with datetime + note
- Task creation with title + due date
- Office assistant integrations on-device
Headline result
Table with columns: Metric, Value| Metric | Value |
|---|
| Task | office tool calling |
| Mean accuracy (n=60 holdout) | 100.0% |
| Bootstrap-CI lower bound (95% conf) | 100.0% |
| Strict gate | 95.0% |
| Status | PASS at strict CI |
Quickstart
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM2-1.7B-Instruct",
torch_dtype="bfloat16",
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = PeftModel.from_pretrained(base, "tjarvis91/Q-Toolcall-1B-LoRA")
model.eval()
chat = [{"role": "user", "content": "Schedule a 1hr meeting with Eve next Thursday at 10am. Issue tool call as JSON {tool, args}."}]
prompt = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=120, do_sample=False, pad_token_id=tok.pad_token_id)
print(tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Expected output:
{"tool": "create_calendar_event", "args": {"attendees": ["Eve"], "duration_min": 60, "datetime": "next Thursday 10am"}}
Compact intelligence is not small intelligence
This model has 18 million trainable parameters (LoRA rank 16 on a 1.7B base).
It runs in bf16 on CPU in a few hundred milliseconds per call. It hits a
100.0% precision bar that most large general models miss because they're
optimizing for breadth, not depth.
Intelligence per watt > parameter count.
Intelligence per watt
Table with columns: Property, Value| Property | Value |
|---|
| Base model | SmolLM2-1.7B-Instruct |
| Adapter size | ~36 MB |
| Trainable params | 18,087,936 |
| Inference | bf16 on CPU; 4-bit QLoRA-friendly |
| VRAM target | 4 GB (Q4) / 8 GB (bf16) |
| Runs offline | yes |
Local AI, no cloud
This adapter ships as part of a local-first AI thesis. No telemetry. No data
leaves the machine. The base model is open. The adapter is signed and
watermarked. The runtime is yours.
The story
Qovaryx is a research line on local-first AI for the constraint-aware operator.
The original Qovaryx Options Decoder closed 15-of-15 internal benchmark cells at
strict bootstrap-CI lower bound, then shipped as a public CPU runtime at
Qovaryx/qovaryx-options-decoder-full-community.
This adapter applies the same compact-intelligence discipline to office work:
single-task LoRA, strict-CI-gated, on-device. The training recipe stays in-house
— the same posture we used for the Options Decoder. What's published is the
artifact and the headline metric.
Limitations
- One job, one specialist. Out-of-domain prompts will get out-of-domain answers.
- This is a LoRA adapter, not a standalone model — you need
HuggingFaceTB/SmolLM2-1.7B-Instruct as the base.
- Holdout is n=60 — a strong CI but not a production cert. Validate on your own data.
- Not financial, medical, legal, or employment advice. Human review for high-stakes use.
Watermark
Each released adapter carries a unique fingerprint in adapter_config.json
(_qovaryx_watermark.fingerprint) for attribution and tamper-detection. This
adapter's fingerprint: c7fb9e95f69acf88c9bf5b3af6adcb93aaad12ce45e7fec1d6b3dd751bf1699e.
Citation
If you use this in research or product work, cite:
@misc{qovaryx_q_toolcall_2026,
author = {Jarvis, Thomas},
title = {Q-Toolcall-1B-LoRA: Qovaryx Compact Intelligence specialist for office tool calling},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/tjarvis91/Q-Toolcall-1B-LoRA},
}
License
Apache-2.0 for the adapter weights. The base model
HuggingFaceTB/SmolLM2-1.7B-Instruct is Apache-2.0 from HuggingFaceTB.
The training corpus, the recipe, and the cluster-shell routing logic are
not part of this release.