Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Key Results

BenchmarkLiteResearcher-4BNotable Comparison
GAIA-Text71.3%= Claude-4.5-Sonnet (71.2%)
Xbench-DS78.0%> Tongyi DeepSearch 30B (75.0%)
Frames83.1%> Claude-4-Sonnet (80.7%)
WebWalkerQA72.7%> Tongyi DeepSearch 30B (72.2%)

All with only 4B parameters — 8–32× smaller than comparable models.

Model Details

  • Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
  • Parameters: 4B
  • Max Context: 262,144 tokens
  • Training: Two-stage difficulty-aware curriculum RL with virtual world environment
  • Agent Mode: ReAct-style with search and visit tools

How It Works

LiteResearcher operates as a ReAct agent that iteratively:

  1. Thinks about what information is needed
  2. Searches the web via Google
  3. Visits webpages to extract evidence
  4. Answers when sufficient information is gathered

The model uses <think>, <tool_call>, and <answer> tags to structure its reasoning.

Quick Start

With the Inference Framework

bash

git clone https://github.com/simplex-ai-inc/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY
# Start SGLang server
python -m sglang.launch_server \
--model-path simplex-ai-inc/LiteResearcher-4B \
--port 6001 --tp 2
# Run inference
bash scripts/run_all.sh \
--model simplex-ai-inc/LiteResearcher-4B \
--dataset data/example.jsonl

Direct Usage with Transformers

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "simplex-ai-inc/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a deep research assistant..."},
{"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Training

LiteResearcher is trained with a three-component framework:

  1. Co-constructed Training Data & Corpus — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
  2. Stable Local Tool Environment — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
  3. Difficulty-Aware Curriculum RL — Multi-stage training that progressively increases task difficulty and context length

Benchmark Results

LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.

ModelSizeGAIABrowseComp (en)BrowseComp (zh)HumanityFramesWebWalkerQAMAIAXbench-DS
Commercial Models
Claude-4-Sonnet-68.312.229.120.380.761.7-64.6
Claude-4.5-Sonnet-71.219.640.824.585.0-53.466.0
DeepSeek-V3.2-63.567.665.040.880.2-38.571.0
DeepSeek-V3.1-63.130.049.229.883.761.2-71.0
Minimax-M2-75.744.048.531.8---72.0
OpenAI-GPT-5-high-76.454.965.035.2--51.477.8
GLM-4.6-71.945.149.530.4---70.0
Kimi-Researcher----26.978.8-36.069.0
Kimi-K2-0905-60.27.422.221.758.1-25.261.0
Open-Source Models
Mirothinker8B66.431.140.221.580.660.640.460.6
Tongyi DeepSearch30B70.943.446.732.990.672.2-75.0
ASearcher QWQ v232B58.7---74.5--51.1
WebSailor30B53.2------53.3
WebDancer (QwQ)32B51.53.818.0--47.9-38.3
WebExplorer8B50.015.732.017.375.762.7-53.7
DeepMiner32B58.733.540.1----62.0
AFM-RL32B55.311.1-18.0-63.0--
SFR-DeepResearch20B66.0--28.782.8---
AgentCPM-Explore4B63.924.129.119.182.768.140.570.0
LiteResearcher4B71.327.5*32.5*22.083.172.741.878.0

Best open-source results in bold. Results with * use a 64k context window with a memory mechanism.

Citation

bibtex

@article{li2026literesearcher,
title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
journal={arXiv preprint arXiv:2604.17931},
year={2026}
}

License

This model is released under the Apache 2.0 License.

Model provider

huggermax

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today