Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0⚠️ A tool-using agent, not a standalone chatbot
The model emits <tool_call> shell commands that must be executed against the
Wikipedia corpus and fed back as <tool_response> turns. To use it you need:
(1) the corpus PeterJinGo/wiki-18-corpus,
(2) a tool-calling vLLM server, and (3) the GrepSeek inference harness (grep tool
- agent loop), all in the code repo.
Usage
bash
git clone https://github.com/alirezasalemi7/grepseek && cd grepseek# env: TRAINING_ENV.md · corpus: cold_start_sft/download_corpus.py# 1. serve this checkpointMODEL_PATH=alireza7/GrepSeek-Qwen3.5-9B-SFT bash rl/serve_rl.sh # -> http://localhost:10730/v1# 2. run the agent (paper inference: temperature 0.6, <=6 turns, 16k context)GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \bash inference/run_inference.sh --base_url http://localhost:10730/v1 \--model grepseek --temperature 0.6 --input my_questions.jsonl --out_dir out
Evaluation (token-F1 / EM, micro-average over 7 QA benchmarks)
This SFT-only policy already substantially beats the untuned base model, but RL adds large gains on multi-hop reasoning:
| variant | micro-avg F1 | micro-avg EM |
|---|---|---|
| base (no SFT, no RL) | 0.3314 | 0.2836 |
| this model (SFT only) | 0.4249 | 0.3569 |
+ GRPO → GrepSeek-Qwen3.5-9B-GRPO | 0.5691 | 0.4948 |
(7 benchmarks: NQ, TriviaQA, PopQA, HotpotQA, 2WikiMultiHopQA, MuSiQue, Bamboogle; trained only on NQ + HotpotQA, the rest are out-of-distribution.)
License
Inherits the license of the base model Qwen/Qwen3.5-9B — confirm and update the
license field above if needed.
Citation
bibtex
@misc{salemi2026grepseektrainingsearchagents,title={GrepSeek: Training Search Agents for Direct Corpus Interaction},author={Alireza Salemi and Chang Zeng and Atharva Nijasure and Jui-Hui Chung and Razieh Rahimi and Fernando Diaz and Hamed Zamani},year={2026},eprint={2605.29307},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2605.29307},}
Model provider
alireza7
Model tree
Base
Qwen/Qwen3.5-9B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information