simplex-ai-inc
LiteResearcher-4B-SFT
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model details
- Base model:
Qwen/Qwen3-4B-Thinking-2507 - Architecture:
Qwen3ForCausalLM(36 layers, hidden 2560, 32 heads, GQA 8 KV heads) - Max position embeddings: 262,144 (RoPE θ = 5,000,000)
- Precision:
bfloat16 - Total params: ~4B
- Training framework: LLaMA-Factory
Training recipe
| Item | Value |
|---|---|
| Stage | SFT (cold-start before RL) |
| Base model | Qwen/Qwen3-4B-Thinking-2507 |
| Dataset | simplex-ai-inc/LiteResearcher-Data (~68.2k SFT trajectories) |
| Max sequence length | 64K (cutoff_len=65536) |
| Global batch size | 128 (per-device bs 2 × grad-accum 8 × 8 GPUs) |
| Epochs | 1 |
| Optimizer steps | 533 |
| Learning rate | 2.0e-5, cosine, 10% warmup |
| Final train loss | ≈ 0.447 (starting loss ≈ 1.19) |
The SFT trajectories teach the model the ReAct think → search → visit → answer
loop and the strict <answer>...</answer> output contract used by the RL environment.
Because the base is the Thinking-2507 variant, the model preserves long
chain-of-thought behavior inside <think>...</think> blocks, which is what the
downstream RL curriculum builds on.
How to use
As the initial policy for RL (recommended use)
bash
# In the LiteResearcher training scripts (Training/ folder of the repo)export MODEL_PATH=$(hf download simplex-ai-inc/LiteResearcher-4B-SFT \--local-dir ./literesearcher_sft)
Then follow the Stage-1 / Stage-2 RL instructions in the LiteResearcher repository.
Stand-alone inference
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "simplex-ai-inc/LiteResearcher-4B-SFT"tok = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
The model expects the same ReAct system prompt and tool schema used by
LiteResearcher (see Inference/ in the repo).
Citation
If you use this checkpoint in academic work, please cite the LiteResearcher project — see the GitHub README for the BibTeX entry.
License
Apache-2.0, inheriting from the Qwen3-4B-Thinking-2507 base model.
Model provider
simplex-ai-inc
Model tree
Base
Qwen/Qwen3-4B-Thinking-2507
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information