Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Why direct corpus interaction?

Index-based retrieval (dense or sparse) suffers from semantic smoothing (blurring fine-grained entity/lexical distinctions), limited controllability (the agent can't enforce exact filters or iteratively refine results), and redundant re-retrieval in multi-hop settings. By executing exact-string shell pipelines (e.g. rg -F), GrepSeek preserves lexical precision, isolates rare symbolic patterns and exact entity names, and composes multi-stage retrieval programs for compositional reasoning — while needing no embedding index (only the ~14 GB raw corpus; no offline indexing).

Training

  • Initialized from: alireza7/GrepSeek-Qwen3.5-9B-SFT (cold-start SFT on alireza7/GrepSeek-ColdStart-SFT-10k; base Qwen/Qwen3.5-9B).
  • RL: GRPO, group size n=5, reward = token-F1 × binary format gate (only structurally valid <think>/<tool_call>/<tool_response>/<answer> trajectories get non-zero reward), 200 steps, LR 5e-6, batch 256, KL disabled, Ulysses SP=2, on 4×A100-80GB. Trained only on NQ + HotpotQA.

⚠️ A tool-using agent, not a standalone chatbot

The model emits <tool_call> shell commands that must be executed against the corpus and returned as <tool_response> turns. You need the corpus (PeterJinGo/wiki-18-corpus), a tool-calling vLLM server, and the GrepSeek inference harness — all in the code repo.

Usage

bash

git clone https://github.com/alirezasalemi7/grepseek && cd grepseek
# env: TRAINING_ENV.md · corpus: cold_start_sft/download_corpus.py
# 1. serve this checkpoint
MODEL_PATH=alireza7/GrepSeek-Qwen3.5-9B-GRPO bash rl/serve_rl.sh # -> http://localhost:10730/v1
# 2a. generation on your own questions
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
--model grepseek --temperature 0.6 --input my_questions.jsonl --out_dir out
# 2b. reproduce the benchmark eval (token-F1 / EM on the Search-R1 suite)
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
--model grepseek --temperature 0.6 --datasets all --out_dir eval

The inference harness also ships the semantics-preserving sharded-parallel execution engine (+ persistent search daemon) that accelerates corpus search by up to 7.6× while remaining byte-exact with sequential grep.

Results (token-level F1)

Trained only on NQ + HotpotQA (marked *); the other five are out-of-distribution. GrepSeek gets the best micro-average and wins 4/7 benchmarks.

NQ*TriviaQAPopQAHotpotQA*2WikiMuSiQueBambooglemicro-avg
Search-R1 (Qwen3-Emb-4B, best baseline)0.50670.76930.51010.55910.42990.28780.69890.5441
GrepSeek (this model)0.52230.76730.48610.62310.51780.30060.62120.5691

Micro-average EM = 0.4948 (also best overall; full EM table in the paper). Gains are largest on multi-hop tasks (HotpotQA, 2Wiki, MuSiQue) that reward exact entity disambiguation and iterative evidence aggregation.

Limitations

Because retrieval is purely lexical, GrepSeek is weaker on surface-form variation / long-tail queries — e.g. PopQA (diacritics, name variants) — and grep has no semantic relevance ranking, so an authoritative passage can be buried behind earlier file-order matches. Dense retrieval remains advantageous on heavily semantic or paraphrase-driven queries.

License

Inherits the license of the base model Qwen/Qwen3.5-9B — confirm and update the license field above if needed.

Citation

bibtex

@misc{salemi2026grepseektrainingsearchagents,
title={GrepSeek: Training Search Agents for Direct Corpus Interaction},
author={Alireza Salemi and Chang Zeng and Atharva Nijasure and Jui-Hui Chung and Razieh Rahimi and Fernando Diaz and Hamed Zamani},
year={2026},
eprint={2605.29307},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.29307},
}

Model provider

alireza7

alireza7

Model tree

Base

alireza7/GrepSeek-Qwen3.5-9B-SFT

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today