Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why this exists
Big general models are good at everything and great at nothing. They burn hundreds of watts to do work that fits in a 36 MB adapter.
This is one specialist from the Qovaryx compact-intelligence release. It does one job — meeting summarization + action extraction — and it does it at 100.0% mean accuracy on a 60-row held-out evaluation, with a 95% bootstrap-CI lower bound of 100.0% against a strict gate of 95.0%.
That's the bar.
What it's good for
- Meeting note → JSON action items (owner + due)
- Attendee extraction from invites
- Decision summarization with topic + decisions list
- Followup detection from past-due commitments
- On-device meeting intelligence
Headline result
| Metric | Value |
|---|---|
| Task | meeting summarization + action extraction |
| Mean accuracy (n=60 holdout) | 100.0% |
| Bootstrap-CI lower bound (95% conf) | 100.0% |
| Strict gate | 95.0% |
| Status | PASS at strict CI |
Quickstart
python
from peft import PeftModelfrom transformers import AutoTokenizer, AutoModelForCausalLMbase = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct",torch_dtype="bfloat16",device_map="auto",)tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")model = PeftModel.from_pretrained(base, "tjarvis91/Q-Meeting-1B-LoRA")model.eval()chat = [{"role": "user", "content": "Extract action items from: 'Carol will share SOW Monday. Dave to confirm legal review by Tue.' JSON {actions:[{owner,due}]}."}]prompt = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)inputs = tok(prompt, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=120, do_sample=False, pad_token_id=tok.pad_token_id)print(tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Expected output:
markdown
{"actions": [{"owner": "Carol", "due": "Monday"}, {"owner": "Dave", "due": "Tuesday"}]}
Compact intelligence is not small intelligence
This model has 18 million trainable parameters (LoRA rank 16 on a 1.7B base). It runs in bf16 on CPU in a few hundred milliseconds per call. It hits a 100.0% precision bar that most large general models miss because they're optimizing for breadth, not depth.
Intelligence per watt > parameter count.
Intelligence per watt
| Property | Value |
|---|---|
| Base model | SmolLM2-1.7B-Instruct |
| Adapter size | ~36 MB |
| Trainable params | 18,087,936 |
| Inference | bf16 on CPU; 4-bit QLoRA-friendly |
| VRAM target | 4 GB (Q4) / 8 GB (bf16) |
| Runs offline | yes |
Local AI, no cloud
This adapter ships as part of a local-first AI thesis. No telemetry. No data leaves the machine. The base model is open. The adapter is signed and watermarked. The runtime is yours.
The story
Qovaryx is a research line on local-first AI for the constraint-aware operator.
The original Qovaryx Options Decoder closed 15-of-15 internal benchmark cells at
strict bootstrap-CI lower bound, then shipped as a public CPU runtime at
Qovaryx/qovaryx-options-decoder-full-community.
This adapter applies the same compact-intelligence discipline to office work: single-task LoRA, strict-CI-gated, on-device. The training recipe stays in-house — the same posture we used for the Options Decoder. What's published is the artifact and the headline metric.
Limitations
- One job, one specialist. Out-of-domain prompts will get out-of-domain answers.
- This is a LoRA adapter, not a standalone model — you need
HuggingFaceTB/SmolLM2-1.7B-Instructas the base. - Holdout is n=60 — a strong CI but not a production cert. Validate on your own data.
- Not financial, medical, legal, or employment advice. Human review for high-stakes use.
Watermark
Each released adapter carries a unique fingerprint in adapter_config.json
(_qovaryx_watermark.fingerprint) for attribution and tamper-detection. This
adapter's fingerprint: 84dff72e6c36cb8a3a8b103166a7c3e279084f24f44c6f6b8e74f70f17280198.
Community + support
- Discord: https://discord.gg/PtuHZDv5ju — builders, install help, model questions
- Ko-fi: https://ko-fi.com/tjarvis91 — every coffee literally buys GPU time for the next training cycle
- Research devlog: https://github.com/thron-j/qovaryx-ai-research
- Companion runtime (options decoder): https://huggingface.co/Qovaryx/qovaryx-options-decoder-full-community
Citation
If you use this in research or product work, cite:
bibtex
@misc{qovaryx_q_meeting_2026,author = {Jarvis, Thomas},title = {Q-Meeting-1B-LoRA: Qovaryx Compact Intelligence specialist for meeting summarization + action extraction},year = {2026},publisher = {Hugging Face},url = {https://huggingface.co/tjarvis91/Q-Meeting-1B-LoRA},}
License
Apache-2.0 for the adapter weights. The base model
HuggingFaceTB/SmolLM2-1.7B-Instruct is Apache-2.0 from HuggingFaceTB.
The training corpus, the recipe, and the cluster-shell routing logic are not part of this release.
Model provider
tjarvis91
Model tree
Base
HuggingFaceTB/SmolLM2-1.7B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information