Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0โจ Highlights
- Tool-orchestrated trajectories. The agent calls
search,image_search, andquery_knowledge(8 callable generation skills) before producing a final programz = (gen_prompt, reference_images). - Self-evolution with Visual Experience Distillation. Best-vs-worst trajectory pairs are distilled token-level into the deployed student. No runtime memory at inference.
- Generator-transferable. The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).
๐ Headline Results
GenEvolve-Bench (KScore, held-out split)
| Method | Generator | KScore | Knowledge-Anch. | Quality-Anch. |
|---|---|---|---|---|
| Qwen-Image (raw) | Qwen-Image | 0.2987 | 0.2384 | 0.3768 |
| Nano Banana Pro (raw) | Nano Banana Pro | 0.5298 | 0.5160 | 0.5477 |
| Gen-Searcher 8B | Qwen-Image-Edit-2511 | 0.3493 | 0.3293 | 0.3745 |
| Gen-Searcher 8B | Nano Banana Pro | 0.5481 | 0.5472 | 0.5492 |
| GenEvolve (Ours) | Qwen-Image-Edit-2511 | 0.3663 | 0.3410 | 0.3990 |
| GenEvolve (Ours) | Nano Banana Pro | 0.5739 | 0.5669 | 0.5830 |
WISE Benchmark (WiScore, six knowledge categories)
| Model | Cultural | Time | Space | Biology | Physics | Chemistry | Overall |
|---|---|---|---|---|---|---|---|
| GPT-4o | 0.81 | 0.71 | 0.89 | 0.83 | 0.79 | 0.74 | 0.80 |
| Gen-Searcher-8B + Qwen-Image | 0.80 | 0.71 | 0.82 | 0.76 | 0.74 | 0.75 | 0.77 |
| Mind-Brush | 0.83 | 0.69 | 0.84 | 0.71 | 0.85 | 0.68 | 0.78 |
| GenEvolve + Qwen-Image-Edit | 0.84 | 0.74 | 0.87 | 0.83 | 0.81 | 0.83 | 0.82 |
๐ง Method Overview
For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.
๐ผ๏ธ Visual Demos
๐จ Gallery โ paired with Nano Banana Pro
๐จ Gallery โ paired with Qwen-Image-Edit (open)
๐ Quick Start
The deployed checkpoint is the student policy โ it consumes a user prompt and returns a JSON gen_prompt + reference_images program through a <think>/<tool_call>/<answer> loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the GitHub repo; the snippet below mirrors its installation and usage.
1. Install the main GenEvolve runtime
bash
git clone https://github.com/MeiGen-AI/GenEvolve.gitcd GenEvolveconda create -n genevolve python=3.11 -y && conda activate genevolvepip install -U pip setuptools wheel packaging psutil ninjapip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128pip install --no-build-isolation -r requirements.txtpip install -e .
Qwen-Image-Edit rendering runs as a separate FastAPI service (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use --backend qwen-image-edit-service.
2. Serve the agent policy
bash
# Single GPU / single replica.MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh
TP shards one model replica across multiple GPUs; DP launches multiple replicas; total GPU usage is TP ร DP.
3. End-to-end example
bash
export SERPER_API_KEY=<your_key> # required for search / image_searchexport GOOGLE_API_KEY=<your_key> # or GEMINI_API_KEY; only for --backend nano-banana-pro# Nano Banana Pro rendererpython examples/quickstart.py \--backend nano-banana-pro \--base-url http://localhost:8000/v1 \--model GenEvolve \--prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \--output paris.png# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)python examples/quickstart.py \--backend qwen-image-edit-service \--service-url http://your-qwen-service:8001 \--base-url http://localhost:8000/v1 \--model GenEvolve \--output paris_qwen.png
The agent's final <answer> is a JSON object:
json
{"gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...","reference_images": [{"img_id": "IMG_001", "note": "what to copy from this image"}]}
gen_prompt MUST refer to selected images using ordinal phrases ("the first reference image") โ never raw IMG_### ids or URLs. Pass (gen_prompt, [r["local_path"] for r in reference_images]) to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.
๐๏ธ Related Artifacts
| Artifact | Link |
|---|---|
| Project page | https://ephemeral182.github.io/GenEvolve/ |
| Paper | Coming soon |
| Code | https://github.com/MeiGen-AI/GenEvolve |
| Training data + benchmark | MeiGen-AI/GenEvolve-Data-Bench |
| Base model | Qwen/Qwen3-VL-8B-Instruct |
โ๏ธ Intended Use, Limits, Bias
- Intended use. Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
- Search dependency. The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
- Bias. Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.
๐ Citation
bibtex
@misc{chen2026genevolveselfevolvingimagegeneration,title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation},author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},year={2026},eprint={2605.21605},archivePrefix={arXiv},primaryClass={cs.CV},url={https://arxiv.org/abs/2605.21605},}
Model provider
MeiGen-AI
Model tree
Base
Qwen/Qwen3-VL-8B-Instruct
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information