flavianv/qwen4b-apparel23-bundle-sft API & Inference Endpoint

Model description

This checkpoint was trained with supervised fine-tuning (full weights, not LoRA) on 20,000 kept Apparel23 outfit bundles. Each training example maps a user query to a compact bundle of real product titles (title-only, no ASINs in the target).

Typical use: shopping assistants, outfit planners, or retrieval pipelines that need structured bundle output before product lookup.

Limitations:

Predictions are often category-plausible but not exact vs gold catalog items (10% exact bundle match on a 10-sample eval).
Performance drops when explicit item hints are removed from the query.
Trained on English apparel queries from the Apparel23 / Qwen-32B labeling pipeline.

Intended use

System prompt (training default)

markdown
You are an outfit bundle assistant for apparel shopping. Given a natural-language outfit request, return the matching bundle as compact product evidence for each selected item. Include the outfit role and product title for every item in the outfit.

Output format

text
### Item 1: dress
Mikarose Chloe Modest Chiffon Maxi Dress or Modest Bridesmaid Dress

### Item 2: footwear
Clarks Women's Danelly Sky Loafer

Supported roles: top, bottom, dress, outer_layer, footwear, accessory.

Quick start

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "flavianv/qwen4b-apparel23-bundle-sft"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

system = (
    "You are an outfit bundle assistant for apparel shopping. "
    "Given a natural-language outfit request, return the matching bundle as "
    "compact product evidence for each selected item. Include the outfit role "
    "and product title for every item in the outfit."
)
messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": "Casual summer outfit for women: denim shorts and ballet flats"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=384, do_sample=False)

print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training data

Split	Rows
Train	20,000
Test (held out)	13,705

Source: 33,705 kept outfits from the Qwen-32B Apparel23 labeling pipeline (train/test split, seed=42).
Dataset: flavianv/apparel23-qwen32b-kept-outfits-with-products
SFT file: apparel23_bundle_sft.train.jsonl

Training procedure

Setting	Value
Base model	`Qwen/Qwen3-4B-Instruct-2507`
Method	Full SFT
Epochs	1 (2,500 steps)
Learning rate	2e-5
Max length	768
Batch size	1 × grad accum 8 (effective 8)
Loss	Assistant-only
Seed	42
Hardware	NVIDIA B200 MIG `4g.90gb`
Run ID	`qwen4b_apparel23_bundle_sft_20260616_142947`

Training metadata is included in bundle_sft_metadata.json in this repo.

Evaluation

Greedy decoding (do_sample=False, max_new_tokens=384) unless noted.

Task metrics (perplexity)

Split	Perplexity	Mean token entropy*
Train (20k)	3.12	1.19
Test (13.7k)	3.46	1.25

*Entropy computed on a 256-row subsample per split (assistant tokens).

Generalization probes (post-SFT)

Probe	Score
Easy math	90% (9/10)
Collapse suite	87.5% (7/8)
Combined	88.75

Zero-shot baseline (same 10 samples, seed=42)

Compared against Qwen/Qwen3-4B-Instruct-2507 with the same system prompt:

Metric	Zero-shot	This model
Bundle format compliance	0/10	10/10
Item count matches gold	0/10	10/10
Exact bundle match	0/10	1/10
Mean title recall	0.0	0.10

Zero-shot produces generic prose titles; this model learns the structured bundle schema and catalog-title style.

Example

Query: Casual summer outfit for women: denim shorts and ballet flats

Output (exact match on eval sample):

text
### Item 1: bottom
Levi's Women's 501 Original Shorts (Also Available in Plus)

### Item 2: footwear
Amazon Essentials Women's Belice Ballet Flat

Citation / lineage

Base model: Qwen/Qwen3-4B-Instruct-2507
Training data: flavianv/apparel23-qwen32b-kept-outfits-with-products
Internal report: docs/qwen4b_apparel23_bundle_sft_report.md in the RecoRL repo

License

This model inherits the license of the base Qwen3-4B-Instruct model. See the Qwen model card for terms.

qwen4b-apparel23-bundle-sft

Get help setting up a custom Dedicated Endpoints.

README