INFUSER-Qwen3-8B-base API & Inference Endpoint

Summary

Base model: Qwen/Qwen3-8B-Base
Method: INFUSER
Data repository: Siyuc/infuser-data

Evaluation

Released Checkpoint Scores

Table with columns: Category, Benchmark, Score
Category	Benchmark	Score
General	MMLU-Pro	67.81%
General	GPQA-Diamond	47.47%
General	SuperGPQA	38.86%
General	BBEH	12.51%
Math & physics	MATH500	84.25%
Math & physics	AIME2024	19.06%
Math & physics	AIME2025	18.02%
Math & physics	HMMT	9.64%
Math & physics	OlympiadBench (Math)	54.45%
Math & physics	OlympiadBench (Phys)	14.41%
Medical	MedQA	66.46%
Medical	MedXpertQA	14.57%
Coding	HumanEval+	78.86%
Coding	LiveCodeBench v1-5	28.47%

Comparison Summary

Category and overall means are computed over the same benchmark groups. R-Few (paper) and SPICE (paper) are self-reported values from their original papers, so missing categories are shown as -.

Table with columns: Category, This model, INFUSER avg, Base, R-Zero, AZR, R-Few (paper), SPICE (paper), General-Reasoner
Category	This model	INFUSER avg	Base	R-Zero	AZR	R-Few (paper)	SPICE (paper)	General-Reasoner
General reasoning	41.66%	40.62%	34.43%	37.14%	37.61%	38.88%	38.75%	41.40%
Math & physics reasoning	33.30%

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Siyuc/INFUSER-Qwen3-8B-base")
tokenizer = AutoTokenizer.from_pretrained("Siyuc/INFUSER-Qwen3-8B-base")

Citation

bibtex
@misc{chen2026infuser,
  title        = {INFUSER: Influence-Guided Self-Evolution Improves Reasoning},
  author       = {Siyu Chen and Miao Lu and Beining Wu and Heejune Sheen and Fengzhuo Zhang and Shuangning Li and Zhiyuan Li and Jose Blanchet and Tianhao Wang and Zhuoran Yang},
  year         = {2026},
  eprint       = {2606.09052},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url          = {https://arxiv.org/abs/2606.09052}
}

Summary

Base model: Qwen/Qwen3-8B-Base
Method: INFUSER
Data repository: Siyuc/infuser-data

Evaluation

Released Checkpoint Scores

Table with columns: Category, Benchmark, Score
Category	Benchmark	Score
General	MMLU-Pro	67.81%
General	GPQA-Diamond	47.47%
General	SuperGPQA	38.86%
General	BBEH	12.51%
Math & physics	MATH500	84.25%
Math & physics	AIME2024	19.06%
Math & physics	AIME2025	18.02%
Math & physics	HMMT	9.64%
Math & physics	OlympiadBench (Math)	54.45%
Math & physics	OlympiadBench (Phys)	14.41%
Medical	MedQA	66.46%
Medical	MedXpertQA	14.57%
Coding	HumanEval+	78.86%
Coding	LiveCodeBench v1-5	28.47%

Comparison Summary

Table with columns: Category, This model, INFUSER avg, Base, R-Zero, AZR, R-Few (paper), SPICE (paper), General-Reasoner
Category	This model	INFUSER avg	Base	R-Zero	AZR	R-Few (paper)	SPICE (paper)	General-Reasoner
General reasoning	41.66%	40.62%	34.43%	37.14%	37.61%	38.88%	38.75%	41.40%
Math & physics reasoning	33.30%

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Siyuc/INFUSER-Qwen3-8B-base")
tokenizer = AutoTokenizer.from_pretrained("Siyuc/INFUSER-Qwen3-8B-base")

Citation

bibtex
@misc{chen2026infuser,
  title        = {INFUSER: Influence-Guided Self-Evolution Improves Reasoning},
  author       = {Siyu Chen and Miao Lu and Beining Wu and Heejune Sheen and Fengzhuo Zhang and Shuangning Li and Zhiyuan Li and Jose Blanchet and Tianhao Wang and Zhuoran Yang},
  year         = {2026},
  eprint       = {2606.09052},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url          = {https://arxiv.org/abs/2606.09052}
}

INFUSER-Qwen3-8B-base

README

Summary

Evaluation

Released Checkpoint Scores

Comparison Summary

Usage

Citation

Explore FriendliAI today

README

Summary

Evaluation

Released Checkpoint Scores

Comparison Summary

Usage

Citation