The model emits <tool_call> shell commands that must be executed against the
Wikipedia corpus and fed back as <tool_response> turns. To use it you need:
(1) the corpus PeterJinGo/wiki-18-corpus,
(2) a tool-calling vLLM server, and (3) the GrepSeek inference harness (grep tool
Usage
git clone https://github.com/alirezasalemi7/grepseek && cd grepseek
# env: TRAINING_ENV.md · corpus: cold_start_sft/download_corpus.py
# 1. serve this checkpoint
MODEL_PATH=alireza7/GrepSeek-Qwen3.5-9B-SFT bash rl/serve_rl.sh # -> http://localhost:10730/v1
# 2. run the agent (paper inference: temperature 0.6, <=6 turns, 16k context)
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
--model grepseek --temperature 0.6 --input my_questions.jsonl --out_dir out
Evaluation (token-F1 / EM, micro-average over 7 QA benchmarks)
This SFT-only policy already substantially beats the untuned base model, but RL
adds large gains on multi-hop reasoning:
Table with columns: variant, micro-avg F1, micro-avg EM| variant | micro-avg F1 | micro-avg EM |
|---|
| base (no SFT, no RL) | 0.3314 | 0.2836 |
| this model (SFT only) | 0.4249 | 0.3569 |
+ GRPO → GrepSeek-Qwen3.5-9B-GRPO | 0.5691 | 0.4948 |
(7 benchmarks: NQ, TriviaQA, PopQA, HotpotQA, 2WikiMultiHopQA, MuSiQue, Bamboogle;
trained only on NQ + HotpotQA, the rest are out-of-distribution.)
License
Inherits the license of the base model Qwen/Qwen3.5-9B — confirm and update the
license field above if needed.
Citation
@misc{salemi2026grepseektrainingsearchagents,
title={GrepSeek: Training Search Agents for Direct Corpus Interaction},
author={Alireza Salemi and Chang Zeng and Atharva Nijasure and Jui-Hui Chung and Razieh Rahimi and Fernando Diaz and Hamed Zamani},
year={2026},
eprint={2605.29307},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.29307},
}