huggermax

LiteResearcher-4B

README

License: apache-2.0

Key Results

Table with columns: Benchmark, LiteResearcher-4B, Notable Comparison
Benchmark	LiteResearcher-4B	Notable Comparison
GAIA-Text	71.3%	= Claude-4.5-Sonnet (71.2%)
Xbench-DS	78.0%	> Tongyi DeepSearch 30B (75.0%)
Frames	83.1%	> Claude-4-Sonnet (80.7%)
WebWalkerQA	72.7%	> Tongyi DeepSearch 30B (72.2%)

All with only 4B parameters — 8–32× smaller than comparable models.

Model Details

Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
Parameters: 4B
Max Context: 262,144 tokens
Training: Two-stage difficulty-aware curriculum RL with virtual world environment
Agent Mode: ReAct-style with search and visit tools

How It Works

LiteResearcher operates as a ReAct agent that iteratively:

Thinks about what information is needed
Searches the web via Google
Visits webpages to extract evidence
Answers when sufficient information is gathered

The model uses <think>, <tool_call>, and <answer> tags to structure its reasoning.

Quick Start

With the Inference Framework

bash
git clone https://github.com/simplex-ai-inc/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY

# Start SGLang server
python -m sglang.launch_server \
    --model-path simplex-ai-inc/LiteResearcher-4B \
    --port 6001 --tp 2

# Run inference
bash scripts/run_all.sh \
    --model simplex-ai-inc/LiteResearcher-4B \
    --dataset data/example.jsonl

Direct Usage with Transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "simplex-ai-inc/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a deep research assistant..."},
    {"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Training

LiteResearcher is trained with a three-component framework:

Co-constructed Training Data & Corpus — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
Stable Local Tool Environment — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
Difficulty-Aware Curriculum RL — Multi-stage training that progressively increases task difficulty and context length

Benchmark Results

LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.

Table with columns: Model, Size, GAIA, BrowseComp (en), BrowseComp (zh), Humanity, Frames, WebWalkerQA, MAIA, Xbench-DS
Model	Size	GAIA	BrowseComp (en)	BrowseComp (zh)	Humanity	Frames	WebWalkerQA	MAIA	Xbench-DS
				Commercial Models

Best open-source results in bold. Results with * use a 64k context window with a memory mechanism.

Citation

bibtex
@article{li2026literesearcher,
  title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
  author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
  journal={arXiv preprint arXiv:2604.17931},
  year={2026}
}

License

This model is released under the Apache 2.0 License.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

huggermax

Model Tree

Base

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Key Results

Table with columns: Benchmark, LiteResearcher-4B, Notable Comparison
Benchmark	LiteResearcher-4B	Notable Comparison
GAIA-Text	71.3%	= Claude-4.5-Sonnet (71.2%)
Xbench-DS	78.0%	> Tongyi DeepSearch 30B (75.0%)
Frames	83.1%	> Claude-4-Sonnet (80.7%)
WebWalkerQA	72.7%	> Tongyi DeepSearch 30B (72.2%)

All with only 4B parameters — 8–32× smaller than comparable models.

Model Details

Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
Parameters: 4B
Max Context: 262,144 tokens
Training: Two-stage difficulty-aware curriculum RL with virtual world environment
Agent Mode: ReAct-style with search and visit tools

How It Works

LiteResearcher operates as a ReAct agent that iteratively:

Thinks about what information is needed
Searches the web via Google
Visits webpages to extract evidence
Answers when sufficient information is gathered

The model uses <think>, <tool_call>, and <answer> tags to structure its reasoning.

Quick Start

With the Inference Framework

bash
git clone https://github.com/simplex-ai-inc/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY

# Start SGLang server
python -m sglang.launch_server \
    --model-path simplex-ai-inc/LiteResearcher-4B \
    --port 6001 --tp 2

# Run inference
bash scripts/run_all.sh \
    --model simplex-ai-inc/LiteResearcher-4B \
    --dataset data/example.jsonl

Direct Usage with Transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "simplex-ai-inc/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a deep research assistant..."},
    {"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Training

LiteResearcher is trained with a three-component framework:

Co-constructed Training Data & Corpus — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
Stable Local Tool Environment — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
Difficulty-Aware Curriculum RL — Multi-stage training that progressively increases task difficulty and context length

Benchmark Results

LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.

Table with columns: Model, Size, GAIA, BrowseComp (en), BrowseComp (zh), Humanity, Frames, WebWalkerQA, MAIA, Xbench-DS
Model	Size	GAIA	BrowseComp (en)	BrowseComp (zh)	Humanity	Frames	WebWalkerQA	MAIA	Xbench-DS
				Commercial Models

Best open-source results in bold. Results with * use a 64k context window with a memory mechanism.

Citation

bibtex
@article{li2026literesearcher,
  title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
  author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
  journal={arXiv preprint arXiv:2604.17931},
  year={2026}
}

License

This model is released under the Apache 2.0 License.