XCombinator

sft-fab-instruct-all

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Prompt format (important)

The model was trained on a unified JSON format: a system prompt that states the task + output schema, a numbered user sequence, and a single JSON answer:

next-step / completion → {"reasoning": "...", "steps": ["STEP", ...]}
anomaly → {"reasoning": "...", "valid": true|false, "rule": "RULE_..."|null}

Build the exact messages with zo_train.prompts.build_messages(task, item) from the project repo, then apply the tokenizer's chat template. Minimal next-step example:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("XCombinator/sft-fab-instruct-all")
model = AutoModelForCausalLM.from_pretrained("XCombinator/sft-fab-instruct-all", torch_dtype="auto")

system = (
    "You are a semiconductor wafer fabrication process-sequence assistant.\n"
    "TASK — Next-step prediction. Reply with one JSON object: "
    '{"reasoning": "...", "steps": ["BEST", "ALT2", ...]} (exact fab step names).'
)
user = (
    "Product family: MOSFET\n"
    "Partial sequence (numbered in execution order):\n"
    "1. RECEIVE WAFER LOT\n2. CLEAN WAFER\n3. GROW FIELD OXIDE\n4. COAT RESIST\n5. EXPOSE PATTERN\n\n"
    "Respond with the JSON object described in OUTPUT FORMAT."
)
msgs = [{"role": "system", "content": system}, {"role": "user", "content": user}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=128, do_sample=False)
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
# -> {"reasoning": "", "steps": ["DEVELOP PHOTORESIST"]}

Use the repo's zo-track / judge-eval harness for scored evaluation; pass --model XCombinator/sft-fab-instruct-all --predictor hf.

Evaluation (MOSFET labeled eval, n≈200)

Table with columns: task, this model, n-gram baseline, frozen base
task	this model	n-gram baseline	frozen base
next-step (top-1)	0.475	0.69	~0
sequence completion (block-acc)	0.555	0.637	~0
anomaly (F1)	0.567	0.89	0

The data-scaled sibling checkpoints push completion block-accuracy to 0.745 (beating the n-gram). See the project repo + submissions/XCombinator/REPORT.md for the full study.

Notes

Full fine-tune (not a LoRA adapter) — loads directly with from_pretrained.
Trained on Leonardo (CINECA) A100; deterministic data factory over the organizer grammar.

Model provider

XCombinator

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Prompt format (important)

The model was trained on a unified JSON format: a system prompt that states the task + output schema, a numbered user sequence, and a single JSON answer:

next-step / completion → {"reasoning": "...", "steps": ["STEP", ...]}
anomaly → {"reasoning": "...", "valid": true|false, "rule": "RULE_..."|null}

Build the exact messages with zo_train.prompts.build_messages(task, item) from the project repo, then apply the tokenizer's chat template. Minimal next-step example:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("XCombinator/sft-fab-instruct-all")
model = AutoModelForCausalLM.from_pretrained("XCombinator/sft-fab-instruct-all", torch_dtype="auto")

system = (
    "You are a semiconductor wafer fabrication process-sequence assistant.\n"
    "TASK — Next-step prediction. Reply with one JSON object: "
    '{"reasoning": "...", "steps": ["BEST", "ALT2", ...]} (exact fab step names).'
)
user = (
    "Product family: MOSFET\n"
    "Partial sequence (numbered in execution order):\n"
    "1. RECEIVE WAFER LOT\n2. CLEAN WAFER\n3. GROW FIELD OXIDE\n4. COAT RESIST\n5. EXPOSE PATTERN\n\n"
    "Respond with the JSON object described in OUTPUT FORMAT."
)
msgs = [{"role": "system", "content": system}, {"role": "user", "content": user}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=128, do_sample=False)
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
# -> {"reasoning": "", "steps": ["DEVELOP PHOTORESIST"]}

Use the repo's zo-track / judge-eval harness for scored evaluation; pass --model XCombinator/sft-fab-instruct-all --predictor hf.

Evaluation (MOSFET labeled eval, n≈200)

Table with columns: task, this model, n-gram baseline, frozen base
task	this model	n-gram baseline	frozen base
next-step (top-1)	0.475	0.69	~0
sequence completion (block-acc)	0.555	0.637	~0
anomaly (F1)	0.567	0.89	0

The data-scaled sibling checkpoints push completion block-accuracy to 0.745 (beating the n-gram). See the project repo + submissions/XCombinator/REPORT.md for the full study.

Notes

Full fine-tune (not a LoRA adapter) — loads directly with from_pretrained.
Trained on Leonardo (CINECA) A100; deterministic data factory over the organizer grammar.

sft-fab-instruct-all

Get help setting up a custom Dedicated Endpoints.

README

Prompt format (important)

Evaluation (MOSFET labeled eval, n≈200)

Notes

Explore FriendliAI today

README

Prompt format (important)

Evaluation (MOSFET labeled eval, n≈200)

Notes