MeiGen-AI

GenEvolve

Deploy Dedicated

README

License: apache-2.0

✨ Highlights

Tool-orchestrated trajectories. The agent calls search, image_search, and query_knowledge (8 callable generation skills) before producing a final program z = (gen_prompt, reference_images).
Self-evolution with Visual Experience Distillation. Best-vs-worst trajectory pairs are distilled token-level into the deployed student. No runtime memory at inference.
Generator-transferable. The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).

📊 Headline Results

GenEvolve-Bench (KScore, held-out split)

Table with columns: Method, Generator, KScore, Knowledge-Anch., Quality-Anch.
Method	Generator	KScore	Knowledge-Anch.	Quality-Anch.
Qwen-Image (raw)	Qwen-Image	0.2987	0.2384	0.3768
Nano Banana Pro (raw)	Nano Banana Pro	0.5298	0.5160	0.5477
Gen-Searcher 8B	Qwen-Image-Edit-2511	0.3493	0.3293	0.3745
Gen-Searcher 8B	Nano Banana Pro	0.5481	0.5472	0.5492
GenEvolve (Ours)	Qwen-Image-Edit-2511

WISE Benchmark (WiScore, six knowledge categories)

Table with columns: Model, Cultural, Time, Space, Biology, Physics, Chemistry, Overall
Model	Cultural	Time	Space	Biology	Physics	Chemistry	Overall
GPT-4o	0.81	0.71	0.89	0.83	0.79	0.74	0.80
Gen-Searcher-8B + Qwen-Image	0.80	0.71	0.82

🧠 Method Overview

For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.

🖼️ Visual Demos

🎨 Gallery — paired with Nano Banana Pro

🎨 Gallery — paired with Qwen-Image-Edit (open)

🚀 Quick Start

The deployed checkpoint is the student policy — it consumes a user prompt and returns a JSON gen_prompt + reference_images program through a <think>/<tool_call>/<answer> loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the GitHub repo; the snippet below mirrors its installation and usage.

1. Install the main GenEvolve runtime

bash
git clone https://github.com/MeiGen-AI/GenEvolve.git
cd GenEvolve

conda create -n genevolve python=3.11 -y && conda activate genevolve
pip install -U pip setuptools wheel packaging psutil ninja
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install --no-build-isolation -r requirements.txt
pip install -e .

Qwen-Image-Edit rendering runs as a separate FastAPI service (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use --backend qwen-image-edit-service.

2. Serve the agent policy

bash
# Single GPU / single replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh

# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh

TP shards one model replica across multiple GPUs; DP launches multiple replicas; total GPU usage is TP × DP.

3. End-to-end example

bash
export SERPER_API_KEY=<your_key>      # required for search / image_search
export GOOGLE_API_KEY=<your_key>      # or GEMINI_API_KEY; only for --backend nano-banana-pro

# Nano Banana Pro renderer
python examples/quickstart.py \
    --backend nano-banana-pro \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
    --output paris.png

# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)
python examples/quickstart.py \
    --backend qwen-image-edit-service \
    --service-url http://your-qwen-service:8001 \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --output paris_qwen.png

The agent's final <answer> is a JSON object:

json
{
  "gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
  "reference_images": [
    {"img_id": "IMG_001", "note": "what to copy from this image"}
  ]
}

gen_prompt MUST refer to selected images using ordinal phrases ("the first reference image") — never raw IMG_### ids or URLs. Pass (gen_prompt, [r["local_path"] for r in reference_images]) to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.

Table with columns: Artifact, Link
Artifact	Link
Project page	https://ephemeral182.github.io/GenEvolve/
Paper	Coming soon
Code	https://github.com/MeiGen-AI/GenEvolve
Training data + benchmark	MeiGen-AI/GenEvolve-Data-Bench
Base model	Qwen/Qwen3-VL-8B-Instruct

⚖️ Intended Use, Limits, Bias

Intended use. Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
Search dependency. The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
Bias. Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.

📑 Citation

bibtex
@misc{chen2026genevolveselfevolvingimagegeneration,
      title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation}, 
      author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
      year={2026},
      eprint={2605.21605},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.21605}, 
}

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

MeiGen-AI

Model Tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Input Modalities

TextImage

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

✨ Highlights

Tool-orchestrated trajectories. The agent calls search, image_search, and query_knowledge (8 callable generation skills) before producing a final program z = (gen_prompt, reference_images).
Self-evolution with Visual Experience Distillation. Best-vs-worst trajectory pairs are distilled token-level into the deployed student. No runtime memory at inference.
Generator-transferable. The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).

📊 Headline Results

GenEvolve-Bench (KScore, held-out split)

Table with columns: Method, Generator, KScore, Knowledge-Anch., Quality-Anch.
Method	Generator	KScore	Knowledge-Anch.	Quality-Anch.
Qwen-Image (raw)	Qwen-Image	0.2987	0.2384	0.3768
Nano Banana Pro (raw)	Nano Banana Pro	0.5298	0.5160	0.5477
Gen-Searcher 8B	Qwen-Image-Edit-2511	0.3493	0.3293	0.3745
Gen-Searcher 8B	Nano Banana Pro	0.5481	0.5472	0.5492
GenEvolve (Ours)	Qwen-Image-Edit-2511

WISE Benchmark (WiScore, six knowledge categories)

Table with columns: Model, Cultural, Time, Space, Biology, Physics, Chemistry, Overall
Model	Cultural	Time	Space	Biology	Physics	Chemistry	Overall
GPT-4o	0.81	0.71	0.89	0.83	0.79	0.74	0.80
Gen-Searcher-8B + Qwen-Image	0.80	0.71	0.82

🧠 Method Overview

For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.

🖼️ Visual Demos

🎨 Gallery — paired with Nano Banana Pro

🎨 Gallery — paired with Qwen-Image-Edit (open)

🚀 Quick Start

1. Install the main GenEvolve runtime

bash
git clone https://github.com/MeiGen-AI/GenEvolve.git
cd GenEvolve

conda create -n genevolve python=3.11 -y && conda activate genevolve
pip install -U pip setuptools wheel packaging psutil ninja
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install --no-build-isolation -r requirements.txt
pip install -e .

2. Serve the agent policy

bash
# Single GPU / single replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh

# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh

TP shards one model replica across multiple GPUs; DP launches multiple replicas; total GPU usage is TP × DP.

3. End-to-end example

bash
export SERPER_API_KEY=<your_key>      # required for search / image_search
export GOOGLE_API_KEY=<your_key>      # or GEMINI_API_KEY; only for --backend nano-banana-pro

# Nano Banana Pro renderer
python examples/quickstart.py \
    --backend nano-banana-pro \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
    --output paris.png

# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)
python examples/quickstart.py \
    --backend qwen-image-edit-service \
    --service-url http://your-qwen-service:8001 \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --output paris_qwen.png

The agent's final <answer> is a JSON object:

json
{
  "gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
  "reference_images": [
    {"img_id": "IMG_001", "note": "what to copy from this image"}
  ]
}

Table with columns: Artifact, Link
Artifact	Link
Project page	https://ephemeral182.github.io/GenEvolve/
Paper	Coming soon
Code	https://github.com/MeiGen-AI/GenEvolve
Training data + benchmark	MeiGen-AI/GenEvolve-Data-Bench
Base model	Qwen/Qwen3-VL-8B-Instruct

⚖️ Intended Use, Limits, Bias

Intended use. Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
Search dependency. The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
Bias. Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.

📑 Citation

bibtex
@misc{chen2026genevolveselfevolvingimagegeneration,
      title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation}, 
      author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
      year={2026},
      eprint={2605.21605},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.21605}, 
}

GenEvolve

README

✨ Highlights

📊 Headline Results

GenEvolve-Bench (KScore, held-out split)

WISE Benchmark (WiScore, six knowledge categories)

🧠 Method Overview

🖼️ Visual Demos

🎨 Gallery — paired with Nano Banana Pro

🎨 Gallery — paired with Qwen-Image-Edit (open)

🚀 Quick Start

1. Install the main GenEvolve runtime

2. Serve the agent policy

3. End-to-end example

🗂️ Related Artifacts

⚖️ Intended Use, Limits, Bias

📑 Citation

Explore FriendliAI today

README

✨ Highlights

📊 Headline Results

GenEvolve-Bench (KScore, held-out split)

WISE Benchmark (WiScore, six knowledge categories)

🧠 Method Overview

🖼️ Visual Demos

🎨 Gallery — paired with Nano Banana Pro

🎨 Gallery — paired with Qwen-Image-Edit (open)

🚀 Quick Start

1. Install the main GenEvolve runtime

2. Serve the agent policy

3. End-to-end example

🗂️ Related Artifacts

⚖️ Intended Use, Limits, Bias

📑 Citation