tjarvis91

Q-Toolcall-1B-LoRA

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Why this exists

Big general models are good at everything and great at nothing. They burn hundreds of watts to do work that fits in a 36 MB adapter.

This is one specialist from the Qovaryx compact-intelligence release. It does one job — office tool calling — and it does it at 100.0% mean accuracy on a 60-row held-out evaluation, with a 95% bootstrap-CI lower bound of 100.0% against a strict gate of 95.0%.

That's the bar.

What it's good for

Calendar event creation with attendees + duration + datetime
Email send tool calls with to/subject/attachment
Reminder scheduling with datetime + note
Task creation with title + due date
Office assistant integrations on-device

Headline result

Table with columns: Metric, Value
Metric	Value
Task	office tool calling
Mean accuracy (n=60 holdout)	100.0%
Bootstrap-CI lower bound (95% conf)	100.0%
Strict gate	95.0%
Status	PASS at strict CI

Quickstart

python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceTB/SmolLM2-1.7B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = PeftModel.from_pretrained(base, "tjarvis91/Q-Toolcall-1B-LoRA")
model.eval()

chat = [{"role": "user", "content": "Schedule a 1hr meeting with Eve next Thursday at 10am. Issue tool call as JSON {tool, args}."}]
prompt = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=120, do_sample=False, pad_token_id=tok.pad_token_id)
print(tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Expected output:

markdown
{"tool": "create_calendar_event", "args": {"attendees": ["Eve"], "duration_min": 60, "datetime": "next Thursday 10am"}}

Compact intelligence is not small intelligence

This model has 18 million trainable parameters (LoRA rank 16 on a 1.7B base). It runs in bf16 on CPU in a few hundred milliseconds per call. It hits a 100.0% precision bar that most large general models miss because they're optimizing for breadth, not depth.

Intelligence per watt > parameter count.

Intelligence per watt

Table with columns: Property, Value
Property	Value
Base model	SmolLM2-1.7B-Instruct
Adapter size	~36 MB
Trainable params	18,087,936
Inference	bf16 on CPU; 4-bit QLoRA-friendly
VRAM target	4 GB (Q4) / 8 GB (bf16)
Runs offline	yes

Local AI, no cloud

This adapter ships as part of a local-first AI thesis. No telemetry. No data leaves the machine. The base model is open. The adapter is signed and watermarked. The runtime is yours.

The story

Qovaryx is a research line on local-first AI for the constraint-aware operator. The original Qovaryx Options Decoder closed 15-of-15 internal benchmark cells at strict bootstrap-CI lower bound, then shipped as a public CPU runtime at Qovaryx/qovaryx-options-decoder-full-community.

This adapter applies the same compact-intelligence discipline to office work: single-task LoRA, strict-CI-gated, on-device. The training recipe stays in-house — the same posture we used for the Options Decoder. What's published is the artifact and the headline metric.

Limitations

One job, one specialist. Out-of-domain prompts will get out-of-domain answers.
This is a LoRA adapter, not a standalone model — you need HuggingFaceTB/SmolLM2-1.7B-Instruct as the base.
Holdout is n=60 — a strong CI but not a production cert. Validate on your own data.
Not financial, medical, legal, or employment advice. Human review for high-stakes use.

Watermark

Each released adapter carries a unique fingerprint in adapter_config.json (_qovaryx_watermark.fingerprint) for attribution and tamper-detection. This adapter's fingerprint: c7fb9e95f69acf88c9bf5b3af6adcb93aaad12ce45e7fec1d6b3dd751bf1699e.

Community + support

Discord: https://discord.gg/PtuHZDv5ju — builders, install help, model questions
Ko-fi: https://ko-fi.com/tjarvis91 — every coffee literally buys GPU time for the next training cycle
Research devlog: https://github.com/thron-j/qovaryx-ai-research
Companion runtime (options decoder): https://huggingface.co/Qovaryx/qovaryx-options-decoder-full-community

Citation

If you use this in research or product work, cite:

bibtex
@misc{qovaryx_q_toolcall_2026,
  author = {Jarvis, Thomas},
  title  = {Q-Toolcall-1B-LoRA: Qovaryx Compact Intelligence specialist for office tool calling},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/tjarvis91/Q-Toolcall-1B-LoRA},
}

License

Apache-2.0 for the adapter weights. The base model HuggingFaceTB/SmolLM2-1.7B-Instruct is Apache-2.0 from HuggingFaceTB.

The training corpus, the recipe, and the cluster-shell routing logic are not part of this release.

Model provider

tjarvis91

Model tree

Base

HuggingFaceTB/SmolLM2-1.7B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Why this exists

Big general models are good at everything and great at nothing. They burn hundreds of watts to do work that fits in a 36 MB adapter.

That's the bar.

What it's good for

Calendar event creation with attendees + duration + datetime
Email send tool calls with to/subject/attachment
Reminder scheduling with datetime + note
Task creation with title + due date
Office assistant integrations on-device

Headline result

Table with columns: Metric, Value
Metric	Value
Task	office tool calling
Mean accuracy (n=60 holdout)	100.0%
Bootstrap-CI lower bound (95% conf)	100.0%
Strict gate	95.0%
Status	PASS at strict CI

Quickstart

python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceTB/SmolLM2-1.7B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = PeftModel.from_pretrained(base, "tjarvis91/Q-Toolcall-1B-LoRA")
model.eval()

chat = [{"role": "user", "content": "Schedule a 1hr meeting with Eve next Thursday at 10am. Issue tool call as JSON {tool, args}."}]
prompt = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=120, do_sample=False, pad_token_id=tok.pad_token_id)
print(tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Expected output:

markdown
{"tool": "create_calendar_event", "args": {"attendees": ["Eve"], "duration_min": 60, "datetime": "next Thursday 10am"}}

Compact intelligence is not small intelligence

Intelligence per watt > parameter count.

Intelligence per watt

Table with columns: Property, Value
Property	Value
Base model	SmolLM2-1.7B-Instruct
Adapter size	~36 MB
Trainable params	18,087,936
Inference	bf16 on CPU; 4-bit QLoRA-friendly
VRAM target	4 GB (Q4) / 8 GB (bf16)
Runs offline	yes

Local AI, no cloud

This adapter ships as part of a local-first AI thesis. No telemetry. No data leaves the machine. The base model is open. The adapter is signed and watermarked. The runtime is yours.

The story

Limitations

One job, one specialist. Out-of-domain prompts will get out-of-domain answers.
This is a LoRA adapter, not a standalone model — you need HuggingFaceTB/SmolLM2-1.7B-Instruct as the base.
Holdout is n=60 — a strong CI but not a production cert. Validate on your own data.
Not financial, medical, legal, or employment advice. Human review for high-stakes use.

Watermark

Community + support

Discord: https://discord.gg/PtuHZDv5ju — builders, install help, model questions
Ko-fi: https://ko-fi.com/tjarvis91 — every coffee literally buys GPU time for the next training cycle
Research devlog: https://github.com/thron-j/qovaryx-ai-research
Companion runtime (options decoder): https://huggingface.co/Qovaryx/qovaryx-options-decoder-full-community

Citation

If you use this in research or product work, cite:

bibtex
@misc{qovaryx_q_toolcall_2026,
  author = {Jarvis, Thomas},
  title  = {Q-Toolcall-1B-LoRA: Qovaryx Compact Intelligence specialist for office tool calling},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/tjarvis91/Q-Toolcall-1B-LoRA},
}

License

Apache-2.0 for the adapter weights. The base model HuggingFaceTB/SmolLM2-1.7B-Instruct is Apache-2.0 from HuggingFaceTB.

The training corpus, the recipe, and the cluster-shell routing logic are not part of this release.

Q-Toolcall-1B-LoRA

Get help setting up a custom Dedicated Endpoints.

README

Why this exists

What it's good for

Headline result

Quickstart

Compact intelligence is not small intelligence

Intelligence per watt

Local AI, no cloud

The story

Limitations

Watermark

Community + support

Citation

License

Explore FriendliAI today

README

Why this exists

What it's good for

Headline result

Quickstart

Compact intelligence is not small intelligence

Intelligence per watt

Local AI, no cloud

The story

Limitations

Watermark

Community + support

Citation

License