Key Results
Table with columns: Benchmark, LiteResearcher-4B, Notable Comparison| Benchmark | LiteResearcher-4B | Notable Comparison |
|---|
| GAIA-Text | 71.3% | = Claude-4.5-Sonnet (71.2%) |
| Xbench-DS | 78.0% | > Tongyi DeepSearch 30B (75.0%) |
| Frames | 83.1% | > Claude-4-Sonnet (80.7%) |
| WebWalkerQA | 72.7% | > Tongyi DeepSearch 30B (72.2%) |
All with only 4B parameters — 8–32× smaller than comparable models.
Model Details
- Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
- Parameters: 4B
- Max Context: 262,144 tokens
- Training: Two-stage difficulty-aware curriculum RL with virtual world environment
- Agent Mode: ReAct-style with
search and visit tools
How It Works
LiteResearcher operates as a ReAct agent that iteratively:
- Thinks about what information is needed
- Searches the web via Google
- Visits webpages to extract evidence
- Answers when sufficient information is gathered
The model uses <think>, <tool_call>, and <answer> tags to structure its reasoning.
Quick Start
With the Inference Framework
git clone https://github.com/simplex-ai-inc/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY
# Start SGLang server
python -m sglang.launch_server \
--model-path simplex-ai-inc/LiteResearcher-4B \
--port 6001 --tp 2
# Run inference
bash scripts/run_all.sh \
--model simplex-ai-inc/LiteResearcher-4B \
--dataset data/example.jsonl
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "simplex-ai-inc/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a deep research assistant..."},
{"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
Training
LiteResearcher is trained with a three-component framework:
- Co-constructed Training Data & Corpus — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
- Stable Local Tool Environment — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
- Difficulty-Aware Curriculum RL — Multi-stage training that progressively increases task difficulty and context length
Benchmark Results
LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.
Table with columns: Model, Size, GAIA, BrowseComp (en), BrowseComp (zh), Humanity, Frames, WebWalkerQA, MAIA, Xbench-DS| Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS |
|---|
| | | | Commercial Models | | | | | |
Best open-source results in bold. Results with * use a 64k context window with a memory mechanism.
Citation
@article{li2026literesearcher,
title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
journal={arXiv preprint arXiv:2604.17931},
year={2026}
}
License
This model is released under the Apache 2.0 License.