Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why direct corpus interaction?
Index-based retrieval (dense or sparse) suffers from semantic smoothing
(blurring fine-grained entity/lexical distinctions), limited controllability
(the agent can't enforce exact filters or iteratively refine results), and
redundant re-retrieval in multi-hop settings. By executing exact-string shell
pipelines (e.g. rg -F), GrepSeek preserves lexical precision, isolates rare
symbolic patterns and exact entity names, and composes multi-stage retrieval
programs for compositional reasoning — while needing no embedding index (only
the ~14 GB raw corpus; no offline indexing).
Training
- Initialized from:
alireza7/GrepSeek-Qwen3.5-9B-SFT(cold-start SFT onalireza7/GrepSeek-ColdStart-SFT-10k; baseQwen/Qwen3.5-9B). - RL: GRPO, group size n=5, reward = token-F1 × binary format gate (only structurally valid
<think>/<tool_call>/<tool_response>/<answer>trajectories get non-zero reward), 200 steps, LR 5e-6, batch 256, KL disabled, Ulysses SP=2, on 4×A100-80GB. Trained only on NQ + HotpotQA.
⚠️ A tool-using agent, not a standalone chatbot
The model emits <tool_call> shell commands that must be executed against the
corpus and returned as <tool_response> turns. You need the corpus
(PeterJinGo/wiki-18-corpus),
a tool-calling vLLM server, and the GrepSeek inference harness — all in the
code repo.
Usage
bash
git clone https://github.com/alirezasalemi7/grepseek && cd grepseek# env: TRAINING_ENV.md · corpus: cold_start_sft/download_corpus.py# 1. serve this checkpointMODEL_PATH=alireza7/GrepSeek-Qwen3.5-9B-GRPO bash rl/serve_rl.sh # -> http://localhost:10730/v1# 2a. generation on your own questionsGREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \bash inference/run_inference.sh --base_url http://localhost:10730/v1 \--model grepseek --temperature 0.6 --input my_questions.jsonl --out_dir out# 2b. reproduce the benchmark eval (token-F1 / EM on the Search-R1 suite)GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \bash inference/run_inference.sh --base_url http://localhost:10730/v1 \--model grepseek --temperature 0.6 --datasets all --out_dir eval
The inference harness also ships the semantics-preserving sharded-parallel
execution engine (+ persistent search daemon) that accelerates corpus search by
up to 7.6× while remaining byte-exact with sequential grep.
Results (token-level F1)
Trained only on NQ + HotpotQA (marked *); the other five are out-of-distribution. GrepSeek gets the best micro-average and wins 4/7 benchmarks.
| NQ* | TriviaQA | PopQA | HotpotQA* | 2Wiki | MuSiQue | Bamboogle | micro-avg | |
|---|---|---|---|---|---|---|---|---|
| Search-R1 (Qwen3-Emb-4B, best baseline) | 0.5067 | 0.7693 | 0.5101 | 0.5591 | 0.4299 | 0.2878 | 0.6989 | 0.5441 |
| GrepSeek (this model) | 0.5223 | 0.7673 | 0.4861 | 0.6231 | 0.5178 | 0.3006 | 0.6212 | 0.5691 |
Micro-average EM = 0.4948 (also best overall; full EM table in the paper). Gains are largest on multi-hop tasks (HotpotQA, 2Wiki, MuSiQue) that reward exact entity disambiguation and iterative evidence aggregation.
Limitations
Because retrieval is purely lexical, GrepSeek is weaker on surface-form
variation / long-tail queries — e.g. PopQA (diacritics, name variants) — and
grep has no semantic relevance ranking, so an authoritative passage can be
buried behind earlier file-order matches. Dense retrieval remains advantageous on
heavily semantic or paraphrase-driven queries.
License
Inherits the license of the base model Qwen/Qwen3.5-9B — confirm and update the
license field above if needed.
Citation
bibtex
@misc{salemi2026grepseektrainingsearchagents,title={GrepSeek: Training Search Agents for Direct Corpus Interaction},author={Alireza Salemi and Chang Zeng and Atharva Nijasure and Jui-Hui Chung and Razieh Rahimi and Fernando Diaz and Hamed Zamani},year={2026},eprint={2605.29307},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2605.29307},}
Model provider
alireza7
Model tree
Base
alireza7/GrepSeek-Qwen3.5-9B-SFT
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information