kstorm77/quick-add-qwen3-0.6b API & Inference Endpoint

⚠️ Dates are emitted VERBATIM — the host resolves them

[event_extract] returns each event's date/time exactly as written in the source ("den 5. november kl 14", "next Thursday", "5/5 at 18", "Sct. Hans kl 9") — it does NOT convert to absolute dates. Your app resolves the phrase to an absolute datetime using the Today is YYYY-MM-DD line (kept in the input as the anchor) + a locale-keyed resolver. Tokens to handle: DK den D. <month> / den D/M / på <weekday> / Sct. Hans; EN <Month> D / D/M / next <weekday>. Recurring lines combine the line's date with each group's time (e.g. 7/6 at 11:15).

System prompt (use VERBATIM — the model is conditioned on this exact text)

markdown
You turn the user's text into JSON. The message begins with a mode tag.

[capture] — a short note typed by the user. Split it into items and classify each as "task", "event", or "note". Keep any time reference verbatim, fuzzy is fine (tomorrow, fredag kl 14, frokost, på torsdage). Refer to people by role/name as written. Output:
{"items":[{"type":"task|event|note","title":"...","when"?:"...","where"?:"...","priority"?:"urgent","recurring"?:true}]}

[event_extract] — a longer text, often an OCR'd chat or screenshot, that starts with "Today is YYYY-MM-DD". Extract ONLY upcoming calendar EVENTS. Keep each event's date and time EXACTLY as written in the source (verbatim) — do NOT resolve to an absolute date; the "Today is" line is context only. Ignore chit-chat, to-dos, and reference facts. Output:
{"items":[{"type":"event","title":"...","when":"<date/time exactly as written>","when_end"?:"<as written>","where"?:"..."}]}

Always output ONLY the JSON object — no prose, no markdown. Preserve the input's language in titles. If nothing fits, output {"items":[]}.

User message = tag + input

task	user message	output
`[capture]`	`[capture] <short note>`	`{"items":[{type:task\|event\|note, title, when?(verbatim), where?, priority?:"urgent", recurring?:true}]}`
`[event_extract]`	`[event_extract] Today is YYYY-MM-DD\n<OCR'd text>`	`{"items":[{type:"event", title, when:"<as written>", when_end?, where?}]}` — future-only, verbatim date, chit-chat ignored

Examples:

[capture] Call mom tomorrow → {"items":[{"type":"task","title":"Call mom","when":"tomorrow"}]}
[capture] Nice weather today → {"items":[]}
[event_extract] Today is 2026-04-01\nMette: Birthday party on 5/5 at 18 at Café Nord → {"items":[{"type":"event","title":"Birthday party","when":"5/5 at 18","where":"Café Nord"}]} → host resolves 5/5 at 18 + anchor 2026-04-01 → 2026-05-05 18:00.

Inference

Greedy (deterministic) decoding; stop on <|im_end|>.
max_new_tokens ≥ 768 (a chunked recurring-event block can emit ~16 events / ~600 tokens; a smaller budget truncates into invalid JSON).
The model prefixes an empty thinking block (<think>\n\n</think>). Strip up to and including </think>, then parse the JSON. Untagged input defaults to [capture].

Chunking (host responsibility)

Chunk long input into ~2000-char fragments, call once per chunk, each re-prefixed with [event_extract] Today is YYYY-MM-DD\n. The model extracts only events fully inside the chunk and returns {"items":[]} otherwise. Merge per-chunk items; de-dupe on (title, when).

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("kstorm77/quick-add-qwen3-0.6b")
model = AutoModelForCausalLM.from_pretrained("kstorm77/quick-add-qwen3-0.6b", dtype=torch.bfloat16, device_map="auto")
msgs = [{"role": "system", "content": SYSTEM}, {"role": "user", "content": "[capture] Call mom tomorrow"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = tok.decode(model.generate(ids, max_new_tokens=768, do_sample=False)[0][ids.shape[1]:], skip_special_tokens=True)
# strip up to </think>, then json.loads

quick-add-qwen3-0.6b

Get help setting up a custom Dedicated Endpoints.

README