Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Model Details

  • Base model: Qwen/Qwen3-1.7B
  • Architecture: Qwen3ForCausalLM, 28 layers, hidden size 2048, 16 attention heads
  • Training method: supervised fine-tuning on high-quality teacher traces
  • Teacher / brain: GPT-4.1 in the DeepOutfit JSON-action harness
  • Training data source: first 150 queries from the OUTFIT500 outfit-query set
  • Filtering: traces with GPT-4.1 judge score >= 80
  • Training examples: 133 filtered traces
  • Tuple style: final-only assistant targets from JSON-action traces
  • Primary task: generate tool/action JSON for outfit planning, product search, and final outfit report construction

Harness Protocol

A typical interaction is not a single natural-language prompt. The harness sends structured events, for example:

json

{"event":"outfit_task_start","query":"men's Ibiza club outfit with linen, no sandals, stylish but practical"}

Then the harness asks for planning:

json

{"event":"planning_instruction","nb_searches":5,"schema":{"look":"...","searches":["..."]}}

The model should first emit a plan:

json

{"action":"todo_writer","look":"sleek Mediterranean nightlife: lightweight shirt, tailored trousers, clean sneaker or loafer, restrained accessory","searches":["men black linen button shirt club summer","men tailored black trousers nightlife","men white leather sneaker summer night","men lightweight blazer casual evening","men silver watch minimal"]}

The harness then provides tool_result / candidate_delta events from catalog search. The model continues with search_products actions and eventually emits a finalize_report action with five non-duplicated, gender-respecting products and a concise explanation.

Evaluation Snapshot

The checkpoint was evaluated on a 50-query holdout from the last OUTFIT500 queries, alongside zero-shot Qwen3 1.7B and a previous SFT+DPO checkpoint. Internal metrics were computed with the DeepOutfit batch-eval harness and GPT-4.1 judge.

ModelOverallQuality>=70 QualityCorrectnessMissing ReportBroken ReportRollouts/min
Qwen3 1.7B zero-shot65.8459.3632%96.4064.787
Previous DeepOutfit SFT+DPO55.7743.7310%86.80224.229
This model73.5172.5156%100.0004.996

Additional probes from the same eval run:

ProbeScore
Easy math generalization10 / 10
Collapse probe suite100 / 100

In the same comparison, this model improved quality by +13.15 absolute points over zero-shot Qwen3 1.7B, a relative gain of 22.15% on the internal judge-quality metric.

Strengths

  • Better JSON-action reliability than the previous SFT+DPO checkpoint in the current harness.
  • Stronger final outfit quality on the 50-query OUTFIT500 holdout.
  • Preserves basic generalization in small math and collapse probes.
  • Learns the new planning-first behavior: choose a coherent look, decompose it into searches, then use catalog candidates in the final outfit.

Known Limitations

  • The model is tied to the DeepOutfit harness and catalog semantics. It should be run with the same structured events and validation logic used during training.
  • Product IDs, search results, and candidate metadata are catalog-specific.
  • The reported metrics are internal GPT-4.1 judge metrics, not a public benchmark.
  • The model can still select duplicate roles or weak product matches when the catalog search results are poor.
  • License and redistribution terms should be checked against the base model and any private training-data constraints before production use.

Recommended Decoding

For deterministic or production-like evaluation, use low temperature. For RL exploration, moderate sampling such as temperature 0.6 with multiple rollouts was used in downstream experiments.

Source Checkpoint

This upload was created from the B200 checkpoint directory:

text

/home/criteo/reco-rl-json-action/outputs/models/qwen3-1.7b-json-action-outfit-gpt41-150-ge80-sft_20260530_152442_finalonly_20260530_152848

Citation

If you use this model in the DeepOutfit experiments, cite it as a GPT-4.1 distilled Qwen3 1.7B JSON-action SFT checkpoint trained on high-quality OUTFIT500 traces.

Model provider

flavianv

Model tree

Base

Qwen/Qwen3-1.7B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today