simplex-ai-inc

LiteResearcher-4B-SFT

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model details

  • Base model: Qwen/Qwen3-4B-Thinking-2507
  • Architecture: Qwen3ForCausalLM (36 layers, hidden 2560, 32 heads, GQA 8 KV heads)
  • Max position embeddings: 262,144 (RoPE θ = 5,000,000)
  • Precision: bfloat16
  • Total params: ~4B
  • Training framework: LLaMA-Factory

Training recipe

Table
ItemValue
StageSFT (cold-start before RL)
Base modelQwen/Qwen3-4B-Thinking-2507
Datasetsimplex-ai-inc/LiteResearcher-Data (~68.2k SFT trajectories)
Max sequence length64K (cutoff_len=65536)
Global batch size128 (per-device bs 2 × grad-accum 8 × 8 GPUs)
Epochs1
Optimizer steps533
Learning rate2.0e-5, cosine, 10% warmup
Final train loss≈ 0.447 (starting loss ≈ 1.19)

The SFT trajectories teach the model the ReAct thinksearchvisitanswer loop and the strict <answer>...</answer> output contract used by the RL environment. Because the base is the Thinking-2507 variant, the model preserves long chain-of-thought behavior inside <think>...</think> blocks, which is what the downstream RL curriculum builds on.

How to use

bash

# In the LiteResearcher training scripts (Training/ folder of the repo)
export MODEL_PATH=$(hf download simplex-ai-inc/LiteResearcher-4B-SFT \
--local-dir ./literesearcher_sft)

Then follow the Stage-1 / Stage-2 RL instructions in the LiteResearcher repository.

Stand-alone inference

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "simplex-ai-inc/LiteResearcher-4B-SFT"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

The model expects the same ReAct system prompt and tool schema used by LiteResearcher (see Inference/ in the repo).

Citation

If you use this checkpoint in academic work, please cite the LiteResearcher project — see the GitHub README for the BibTeX entry.

License

Apache-2.0, inheriting from the Qwen3-4B-Thinking-2507 base model.

Model provider

simplex-ai-inc

Model tree

Base

Qwen/Qwen3-4B-Thinking-2507

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today