Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0⚠️ Dates are emitted VERBATIM — the host resolves them
[event_extract] returns each event's date/time exactly as written in the source
("den 5. november kl 14", "next Thursday", "5/5 at 18", "Sct. Hans kl 9") — it does NOT
convert to absolute dates. Your app resolves the phrase to an absolute datetime using the
Today is YYYY-MM-DD line (kept in the input as the anchor) + a locale-keyed resolver. Tokens
to handle: DK den D. <month> / den D/M / på <weekday> / Sct. Hans; EN <Month> D / D/M
/ next <weekday>. Recurring lines combine the line's date with each group's time
(e.g. 7/6 at 11:15).
System prompt (use VERBATIM — the model is conditioned on this exact text)
markdown
You turn the user's text into JSON. The message begins with a mode tag.[capture] — a short note typed by the user. Split it into items and classify each as "task", "event", or "note". Keep any time reference verbatim, fuzzy is fine (tomorrow, fredag kl 14, frokost, på torsdage). Refer to people by role/name as written. Output:{"items":[{"type":"task|event|note","title":"...","when"?:"...","where"?:"...","priority"?:"urgent","recurring"?:true}]}[event_extract] — a longer text, often an OCR'd chat or screenshot, that starts with "Today is YYYY-MM-DD". Extract ONLY upcoming calendar EVENTS. Keep each event's date and time EXACTLY as written in the source (verbatim) — do NOT resolve to an absolute date; the "Today is" line is context only. Ignore chit-chat, to-dos, and reference facts. Output:{"items":[{"type":"event","title":"...","when":"<date/time exactly as written>","when_end"?:"<as written>","where"?:"..."}]}Always output ONLY the JSON object — no prose, no markdown. Preserve the input's language in titles. If nothing fits, output {"items":[]}.
User message = tag + input
| task | user message | output |
|---|---|---|
[capture] | [capture] <short note> | {"items":[{type:task|event|note, title, when?(verbatim), where?, priority?:"urgent", recurring?:true}]} |
[event_extract] | [event_extract] Today is YYYY-MM-DD\n<OCR'd text> | {"items":[{type:"event", title, when:"<as written>", when_end?, where?}]} — future-only, verbatim date, chit-chat ignored |
Examples:
[capture] Call mom tomorrow→{"items":[{"type":"task","title":"Call mom","when":"tomorrow"}]}[capture] Nice weather today→{"items":[]}[event_extract] Today is 2026-04-01\nMette: Birthday party on 5/5 at 18 at Café Nord→{"items":[{"type":"event","title":"Birthday party","when":"5/5 at 18","where":"Café Nord"}]}→ host resolves5/5 at 18+ anchor2026-04-01→2026-05-05 18:00.
Inference
- Greedy (deterministic) decoding; stop on
<|im_end|>. max_new_tokens≥ 768 (a chunked recurring-event block can emit ~16 events / ~600 tokens; a smaller budget truncates into invalid JSON).- The model prefixes an empty thinking block (
<think>\n\n</think>). Strip up to and including</think>, then parse the JSON. Untagged input defaults to[capture].
Chunking (host responsibility)
Chunk long input into ~2000-char fragments, call once per chunk, each re-prefixed with
[event_extract] Today is YYYY-MM-DD\n. The model extracts only events fully inside the chunk
and returns {"items":[]} otherwise. Merge per-chunk items; de-dupe on (title, when).
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchtok = AutoTokenizer.from_pretrained("kstorm77/quick-add-qwen3-0.6b")model = AutoModelForCausalLM.from_pretrained("kstorm77/quick-add-qwen3-0.6b", dtype=torch.bfloat16, device_map="auto")msgs = [{"role": "system", "content": SYSTEM}, {"role": "user", "content": "[capture] Call mom tomorrow"}]ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)out = tok.decode(model.generate(ids, max_new_tokens=768, do_sample=False)[0][ids.shape[1]:], skip_special_tokens=True)# strip up to </think>, then json.loads
Model provider
kstorm77
Model tree
Base
Qwen/Qwen3-0.6B
Quantized
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information