Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Training

Base model:

  • Qwen/Qwen3-1.7B

Supervised fine-tuning stage:

  • Data: filtered JSON-action outfit rollouts.
  • Selection rule used by the local pipeline: score four rollouts per outfit query, select the top rollout per query when its score is greater than 60, then export selected raw traces for SFT.
  • Max length: 16,384.
  • Epochs: 3.
  • Learning rate: 2e-5.
  • Per-device train batch size: 1.
  • Gradient accumulation steps: 16.
  • Assistant-only loss: enabled.
  • Full fine-tune, not LoRA.

DPO stage:

  • Starting checkpoint: the outfit SFT model.
  • Data: 100 outfit preference-query training rows and 50 validation rows in the local DeepOutfit pipeline.
  • Max length: 8,192.
  • Epochs: 1.
  • Learning rate: 5e-7.
  • DPO beta: 0.1.
  • Per-device train batch size: 1.
  • Gradient accumulation steps: 8.
  • Full fine-tune, not LoRA.

Uploaded source directory:

outputs/models/qwen3-1.7b-json-action-outfit-sft-dpo-100q-cont1_20260527_230029

Evaluation

Evaluation was run with the local batch_eval_outfit_models.py harness on 50 OUTFIT500 queries, one low-temperature rollout per query. Outfit quality was scored by GPT-4.1 judge. The comparison included Qwen zero-shot, this SFT+DPO checkpoint, and a later GRPO/RL checkpoint.

ModelRowsOverallGeneralizationEntropyEfficiencyCorrectnessQuality>=70 QualityMissing ReportBroken ReportTokens MedianCalls MedianRollouts/min
Qwen3-1.7B zero-shot5057.031000.041669.3884.829.930%2%22%3,632110.72
DeepOutfit SFT+DPO5055.36900.065039.7182.041.588%0%30%11,24058.21
DeepOutfit GRPO/RL checkpoint5056.69900.064140.7988.439.106%2%16%11,28057.05

Judge breakdown for this SFT+DPO checkpoint:

MetricValue
Judged rows50
Judge score > 704 / 50
Mean judge score41.58
Max judge score94.27
Min judge score21.60
Best average judge submetricvalidity gate, 94.0 / 100
Worst average judge submetricexplanation average, 40.8 / 100
Highest failure flagimpractical to wear, 88%

Generalization probes:

ProbeResult
Easy math8 / 10
JSON formatting2 / 2
Factual QA2 / 2
Exact string following2 / 2
Simple code-output QA2 / 2

Interpretation: compared with zero-shot Qwen3-1.7B, this checkpoint improves GPT-4.1 judged outfit quality, but uses more search/tool calls and more tokens. The dominant remaining failure mode is outfit practicality.

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "flavianv/deepoutfit-qwen17b-sft-dpo"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{
"role": "user",
"content": "Find a men's backyard BBQ host outfit that is casual, practical, and intentional.",
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.2, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

For the intended JSON-action setting, use the same tool schema and validation loop as the training/evaluation harness. Standalone generations may reference products or tool actions that are only meaningful when connected to the product search tool.

Limitations

  • Experimental research checkpoint, not production validated.
  • Optimized for outfit/product-report behavior, not broad assistant quality.
  • Can produce incomplete, impractical, or unsupported product combinations.
  • Product IDs and search behavior depend on the external catalog/tool harness.
  • Easy-math probing shows some drift versus the zero-shot base model.

License

This checkpoint is released under Apache 2.0, matching the base Qwen/Qwen3-1.7B license metadata.

Model provider

flavianv

Model tree

Base

Qwen/Qwen3-1.7B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today