Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

⚠️ Dates are emitted VERBATIM — the host resolves them

[event_extract] returns each event's date/time exactly as written in the source ("den 5. november kl 14", "next Thursday", "5/5 at 18", "Sct. Hans kl 9") — it does NOT convert to absolute dates. Your app resolves the phrase to an absolute datetime using the Today is YYYY-MM-DD line (kept in the input as the anchor) + a locale-keyed resolver. Tokens to handle: DK den D. <month> / den D/M / på <weekday> / Sct. Hans; EN <Month> D / D/M / next <weekday>. Recurring lines combine the line's date with each group's time (e.g. 7/6 at 11:15).

System prompt (use VERBATIM — the model is conditioned on this exact text)

markdown

You turn the user's text into JSON. The message begins with a mode tag.
[capture] — a short note typed by the user. Split it into items and classify each as "task", "event", or "note". Keep any time reference verbatim, fuzzy is fine (tomorrow, fredag kl 14, frokost, på torsdage). Refer to people by role/name as written. Output:
{"items":[{"type":"task|event|note","title":"...","when"?:"...","where"?:"...","priority"?:"urgent","recurring"?:true}]}
[event_extract] — a longer text, often an OCR'd chat or screenshot, that starts with "Today is YYYY-MM-DD". Extract ONLY upcoming calendar EVENTS. Keep each event's date and time EXACTLY as written in the source (verbatim) — do NOT resolve to an absolute date; the "Today is" line is context only. Ignore chit-chat, to-dos, and reference facts. Output:
{"items":[{"type":"event","title":"...","when":"<date/time exactly as written>","when_end"?:"<as written>","where"?:"..."}]}
Always output ONLY the JSON object — no prose, no markdown. Preserve the input's language in titles. If nothing fits, output {"items":[]}.

User message = tag + input

taskuser messageoutput
[capture][capture] <short note>{"items":[{type:task|event|note, title, when?(verbatim), where?, priority?:"urgent", recurring?:true}]}
[event_extract][event_extract] Today is YYYY-MM-DD\n<OCR'd text>{"items":[{type:"event", title, when:"<as written>", when_end?, where?}]} — future-only, verbatim date, chit-chat ignored

Examples:

  • [capture] Call mom tomorrow{"items":[{"type":"task","title":"Call mom","when":"tomorrow"}]}
  • [capture] Nice weather today{"items":[]}
  • [event_extract] Today is 2026-04-01\nMette: Birthday party on 5/5 at 18 at Café Nord{"items":[{"type":"event","title":"Birthday party","when":"5/5 at 18","where":"Café Nord"}]} → host resolves 5/5 at 18 + anchor 2026-04-012026-05-05 18:00.

Inference

  • Greedy (deterministic) decoding; stop on <|im_end|>.
  • max_new_tokens ≥ 768 (a chunked recurring-event block can emit ~16 events / ~600 tokens; a smaller budget truncates into invalid JSON).
  • The model prefixes an empty thinking block (<think>\n\n</think>). Strip up to and including </think>, then parse the JSON. Untagged input defaults to [capture].

Chunking (host responsibility)

Chunk long input into ~2000-char fragments, call once per chunk, each re-prefixed with [event_extract] Today is YYYY-MM-DD\n. The model extracts only events fully inside the chunk and returns {"items":[]} otherwise. Merge per-chunk items; de-dupe on (title, when).

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("kstorm77/quick-add-qwen3-0.6b")
model = AutoModelForCausalLM.from_pretrained("kstorm77/quick-add-qwen3-0.6b", dtype=torch.bfloat16, device_map="auto")
msgs = [{"role": "system", "content": SYSTEM}, {"role": "user", "content": "[capture] Call mom tomorrow"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = tok.decode(model.generate(ids, max_new_tokens=768, do_sample=False)[0][ids.shape[1]:], skip_special_tokens=True)
# strip up to </think>, then json.loads

Model provider

kstorm77

Model tree

Base

Qwen/Qwen3-0.6B

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today