Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Prompt format (important)

The model was trained on a unified JSON format: a system prompt that states the task + output schema, a numbered user sequence, and a single JSON answer:

  • next-step / completion → {"reasoning": "...", "steps": ["STEP", ...]}
  • anomaly → {"reasoning": "...", "valid": true|false, "rule": "RULE_..."|null}

Build the exact messages with zo_train.prompts.build_messages(task, item) from the project repo, then apply the tokenizer's chat template. Minimal next-step example:

python

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("XCombinator/sft-fab-instruct-all")
model = AutoModelForCausalLM.from_pretrained("XCombinator/sft-fab-instruct-all", torch_dtype="auto")
system = (
"You are a semiconductor wafer fabrication process-sequence assistant.\n"
"TASK — Next-step prediction. Reply with one JSON object: "
'{"reasoning": "...", "steps": ["BEST", "ALT2", ...]} (exact fab step names).'
)
user = (
"Product family: MOSFET\n"
"Partial sequence (numbered in execution order):\n"
"1. RECEIVE WAFER LOT\n2. CLEAN WAFER\n3. GROW FIELD OXIDE\n4. COAT RESIST\n5. EXPOSE PATTERN\n\n"
"Respond with the JSON object described in OUTPUT FORMAT."
)
msgs = [{"role": "system", "content": system}, {"role": "user", "content": user}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=128, do_sample=False)
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
# -> {"reasoning": "", "steps": ["DEVELOP PHOTORESIST"]}

Use the repo's zo-track / judge-eval harness for scored evaluation; pass --model XCombinator/sft-fab-instruct-all --predictor hf.

Evaluation (MOSFET labeled eval, n≈200)

taskthis modeln-gram baselinefrozen base
next-step (top-1)0.4750.69~0
sequence completion (block-acc)0.5550.637~0
anomaly (F1)0.5670.890

The data-scaled sibling checkpoints push completion block-accuracy to 0.745 (beating the n-gram). See the project repo + submissions/XCombinator/REPORT.md for the full study.

Notes

  • Full fine-tune (not a LoRA adapter) — loads directly with from_pretrained.
  • Trained on Leonardo (CINECA) A100; deterministic data factory over the organizer grammar.

Model provider

XCombinator

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today