Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Prompt format (important)
The model was trained on a unified JSON format: a system prompt that states the task + output schema, a numbered user sequence, and a single JSON answer:
- next-step / completion →
{"reasoning": "...", "steps": ["STEP", ...]} - anomaly →
{"reasoning": "...", "valid": true|false, "rule": "RULE_..."|null}
Build the exact messages with zo_train.prompts.build_messages(task, item) from the
project repo, then apply the tokenizer's chat
template. Minimal next-step example:
python
from transformers import AutoModelForCausalLM, AutoTokenizertok = AutoTokenizer.from_pretrained("XCombinator/sft-fab-instruct-all")model = AutoModelForCausalLM.from_pretrained("XCombinator/sft-fab-instruct-all", torch_dtype="auto")system = ("You are a semiconductor wafer fabrication process-sequence assistant.\n""TASK — Next-step prediction. Reply with one JSON object: "'{"reasoning": "...", "steps": ["BEST", "ALT2", ...]} (exact fab step names).')user = ("Product family: MOSFET\n""Partial sequence (numbered in execution order):\n""1. RECEIVE WAFER LOT\n2. CLEAN WAFER\n3. GROW FIELD OXIDE\n4. COAT RESIST\n5. EXPOSE PATTERN\n\n""Respond with the JSON object described in OUTPUT FORMAT.")msgs = [{"role": "system", "content": system}, {"role": "user", "content": user}]prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)ids = tok(prompt, return_tensors="pt").to(model.device)out = model.generate(**ids, max_new_tokens=128, do_sample=False)print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))# -> {"reasoning": "", "steps": ["DEVELOP PHOTORESIST"]}
Use the repo's zo-track / judge-eval harness for scored evaluation; pass
--model XCombinator/sft-fab-instruct-all --predictor hf.
Evaluation (MOSFET labeled eval, n≈200)
| task | this model | n-gram baseline | frozen base |
|---|---|---|---|
| next-step (top-1) | 0.475 | 0.69 | ~0 |
| sequence completion (block-acc) | 0.555 | 0.637 | ~0 |
| anomaly (F1) | 0.567 | 0.89 | 0 |
The data-scaled sibling checkpoints push completion block-accuracy to 0.745 (beating the n-gram).
See the project repo + submissions/XCombinator/REPORT.md for the full study.
Notes
- Full fine-tune (not a LoRA adapter) — loads directly with
from_pretrained. - Trained on Leonardo (CINECA) A100; deterministic data factory over the organizer grammar.
Model provider
XCombinator
Model tree
Base
Qwen/Qwen2.5-1.5B-Instruct
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information