alireza7

GrepSeek-Qwen3.5-9B-SFT

README

License: apache-2.0

⚠️ A tool-using agent, not a standalone chatbot

The model emits <tool_call> shell commands that must be executed against the Wikipedia corpus and fed back as <tool_response> turns. To use it you need: (1) the corpus PeterJinGo/wiki-18-corpus, (2) a tool-calling vLLM server, and (3) the GrepSeek inference harness (grep tool

agent loop), all in the code repo.

Usage

bash
git clone https://github.com/alirezasalemi7/grepseek && cd grepseek
# env: TRAINING_ENV.md  ·  corpus: cold_start_sft/download_corpus.py

# 1. serve this checkpoint
MODEL_PATH=alireza7/GrepSeek-Qwen3.5-9B-SFT bash rl/serve_rl.sh         # -> http://localhost:10730/v1

# 2. run the agent (paper inference: temperature 0.6, <=6 turns, 16k context)
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
  bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
    --model grepseek --temperature 0.6 --input my_questions.jsonl --out_dir out

Evaluation (token-F1 / EM, micro-average over 7 QA benchmarks)

This SFT-only policy already substantially beats the untuned base model, but RL adds large gains on multi-hop reasoning:

Table with columns: variant, micro-avg F1, micro-avg EM
variant	micro-avg F1	micro-avg EM
base (no SFT, no RL)	0.3314	0.2836
this model (SFT only)	0.4249	0.3569
+ GRPO → `GrepSeek-Qwen3.5-9B-GRPO`	0.5691	0.4948

(7 benchmarks: NQ, TriviaQA, PopQA, HotpotQA, 2WikiMultiHopQA, MuSiQue, Bamboogle; trained only on NQ + HotpotQA, the rest are out-of-distribution.)

License

Inherits the license of the base model Qwen/Qwen3.5-9B — confirm and update the license field above if needed.

Citation

bibtex
@misc{salemi2026grepseektrainingsearchagents,
      title={GrepSeek: Training Search Agents for Direct Corpus Interaction},
      author={Alireza Salemi and Chang Zeng and Atharva Nijasure and Jui-Hui Chung and Razieh Rahimi and Fernando Diaz and Hamed Zamani},
      year={2026},
      eprint={2605.29307},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.29307},
}

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

alireza7

Model Tree

Base

Qwen/Qwen3.5-9B

Fine-tuned

this model

Input Modalities

Text

Image

Video

Output Modalities