Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model description
This checkpoint was trained with supervised fine-tuning (full weights, not LoRA) on 20,000 kept Apparel23 outfit bundles. Each training example maps a user query to a compact bundle of real product titles (title-only, no ASINs in the target).
Typical use: shopping assistants, outfit planners, or retrieval pipelines that need structured bundle output before product lookup.
Limitations:
- Predictions are often category-plausible but not exact vs gold catalog items (10% exact bundle match on a 10-sample eval).
- Performance drops when explicit item hints are removed from the query.
- Trained on English apparel queries from the Apparel23 / Qwen-32B labeling pipeline.
Intended use
System prompt (training default)
markdown
You are an outfit bundle assistant for apparel shopping. Given a natural-language outfit request, return the matching bundle as compact product evidence for each selected item. Include the outfit role and product title for every item in the outfit.
Output format
text
### Item 1: dressMikarose Chloe Modest Chiffon Maxi Dress or Modest Bridesmaid Dress### Item 2: footwearClarks Women's Danelly Sky Loafer
Supported roles: top, bottom, dress, outer_layer, footwear, accessory.
Quick start
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "flavianv/qwen4b-apparel23-bundle-sft"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto",)system = ("You are an outfit bundle assistant for apparel shopping. ""Given a natural-language outfit request, return the matching bundle as ""compact product evidence for each selected item. Include the outfit role ""and product title for every item in the outfit.")messages = [{"role": "system", "content": system},{"role": "user", "content": "Casual summer outfit for women: denim shorts and ballet flats"},]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer([prompt], return_tensors="pt").to(model.device)with torch.inference_mode():out = model.generate(**inputs, max_new_tokens=384, do_sample=False)print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training data
| Split | Rows |
|---|---|
| Train | 20,000 |
| Test (held out) | 13,705 |
- Source: 33,705 kept outfits from the Qwen-32B Apparel23 labeling pipeline (train/test split, seed=42).
- Dataset: flavianv/apparel23-qwen32b-kept-outfits-with-products
- SFT file:
apparel23_bundle_sft.train.jsonl
Training procedure
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Method | Full SFT |
| Epochs | 1 (2,500 steps) |
| Learning rate | 2e-5 |
| Max length | 768 |
| Batch size | 1 × grad accum 8 (effective 8) |
| Loss | Assistant-only |
| Seed | 42 |
| Hardware | NVIDIA B200 MIG 4g.90gb |
| Run ID | qwen4b_apparel23_bundle_sft_20260616_142947 |
Training metadata is included in bundle_sft_metadata.json in this repo.
Evaluation
Greedy decoding (do_sample=False, max_new_tokens=384) unless noted.
Task metrics (perplexity)
| Split | Perplexity | Mean token entropy* |
|---|---|---|
| Train (20k) | 3.12 | 1.19 |
| Test (13.7k) | 3.46 | 1.25 |
*Entropy computed on a 256-row subsample per split (assistant tokens).
Generalization probes (post-SFT)
| Probe | Score |
|---|---|
| Easy math | 90% (9/10) |
| Collapse suite | 87.5% (7/8) |
| Combined | 88.75 |
Zero-shot baseline (same 10 samples, seed=42)
Compared against Qwen/Qwen3-4B-Instruct-2507 with the same system prompt:
| Metric | Zero-shot | This model |
|---|---|---|
| Bundle format compliance | 0/10 | 10/10 |
| Item count matches gold | 0/10 | 10/10 |
| Exact bundle match | 0/10 | 1/10 |
| Mean title recall | 0.0 | 0.10 |
Zero-shot produces generic prose titles; this model learns the structured bundle schema and catalog-title style.
Example
Query: Casual summer outfit for women: denim shorts and ballet flats
Output (exact match on eval sample):
text
### Item 1: bottomLevi's Women's 501 Original Shorts (Also Available in Plus)### Item 2: footwearAmazon Essentials Women's Belice Ballet Flat
Citation / lineage
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Training data: flavianv/apparel23-qwen32b-kept-outfits-with-products
- Internal report:
docs/qwen4b_apparel23_bundle_sft_report.mdin the RecoRL repo
License
This model inherits the license of the base Qwen3-4B-Instruct model. See the Qwen model card for terms.
Model provider
flavianv
Model tree
Base
Qwen/Qwen3-4B-Instruct-2507
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information