DavidBShan

linkd-search-qwen3.6-35b-a3b-grpo100-lora

README

License: apache-2.0

linkd-search GRPO LoRA — grpo_000100 (lineage best)

Rank-64 LoRA adapter (alpha 32, all-linear) over Qwen/Qwen3.6-35B-A3B, trained with the Freesolo SFT -> GRPO pipeline on the linkd-search task: translating natural-language people-search requests into MongoDB find filters for the Berkeley.profilematch collection.

Tinker checkpoint: tinker://4173b059-c42e-53ef-8782-293086486655:train:0/sampler_weights/grpo_000100
Lineage-best final_eval score: 0.6672 ("Executable Search Quality", 347 rows, real query execution + LLM judge)
Renderer at train/eval time: qwen3_5 (thinking enabled; only the final non-thinking text is scored)
Generation settings used for evaluation: temperature 0.0, max_tokens 2048

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

DavidBShan

Model Tree

Base

Qwen/Qwen3.6-35B-A3B

Adapter

this model

Input Modalities

Text

Image

Video

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

linkd-search GRPO LoRA — grpo_000100 (lineage best)

Tinker checkpoint: tinker://4173b059-c42e-53ef-8782-293086486655:train:0/sampler_weights/grpo_000100
Lineage-best final_eval score: 0.6672 ("Executable Search Quality", 347 rows, real query execution + LLM judge)
Renderer at train/eval time: qwen3_5 (thinking enabled; only the final non-thinking text is scored)
Generation settings used for evaluation: temperature 0.0, max_tokens 2048