Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

โœจ Highlights

  • Tool-orchestrated trajectories. The agent calls search, image_search, and query_knowledge (8 callable generation skills) before producing a final program z = (gen_prompt, reference_images).
  • Self-evolution with Visual Experience Distillation. Best-vs-worst trajectory pairs are distilled token-level into the deployed student. No runtime memory at inference.
  • Generator-transferable. The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).

๐Ÿ“Š Headline Results

GenEvolve-Bench (KScore, held-out split)

MethodGeneratorKScoreKnowledge-Anch.Quality-Anch.
Qwen-Image (raw)Qwen-Image0.29870.23840.3768
Nano Banana Pro (raw)Nano Banana Pro0.52980.51600.5477
Gen-Searcher 8BQwen-Image-Edit-25110.34930.32930.3745
Gen-Searcher 8BNano Banana Pro0.54810.54720.5492
GenEvolve (Ours)Qwen-Image-Edit-25110.36630.34100.3990
GenEvolve (Ours)Nano Banana Pro0.57390.56690.5830

WISE Benchmark (WiScore, six knowledge categories)

ModelCulturalTimeSpaceBiologyPhysicsChemistryOverall
GPT-4o0.810.710.890.830.790.740.80
Gen-Searcher-8B + Qwen-Image0.800.710.820.760.740.750.77
Mind-Brush0.830.690.840.710.850.680.78
GenEvolve + Qwen-Image-Edit0.840.740.870.830.810.830.82

๐Ÿง  Method Overview

For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.


๐Ÿ–ผ๏ธ Visual Demos

๐ŸŽจ Gallery โ€” paired with Nano Banana Pro

๐ŸŽจ Gallery โ€” paired with Qwen-Image-Edit (open)


๐Ÿš€ Quick Start

The deployed checkpoint is the student policy โ€” it consumes a user prompt and returns a JSON gen_prompt + reference_images program through a <think>/<tool_call>/<answer> loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the GitHub repo; the snippet below mirrors its installation and usage.

1. Install the main GenEvolve runtime

bash

git clone https://github.com/MeiGen-AI/GenEvolve.git
cd GenEvolve
conda create -n genevolve python=3.11 -y && conda activate genevolve
pip install -U pip setuptools wheel packaging psutil ninja
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install --no-build-isolation -r requirements.txt
pip install -e .

Qwen-Image-Edit rendering runs as a separate FastAPI service (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use --backend qwen-image-edit-service.

2. Serve the agent policy

bash

# Single GPU / single replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh
# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh

TP shards one model replica across multiple GPUs; DP launches multiple replicas; total GPU usage is TP ร— DP.

3. End-to-end example

bash

export SERPER_API_KEY=<your_key> # required for search / image_search
export GOOGLE_API_KEY=<your_key> # or GEMINI_API_KEY; only for --backend nano-banana-pro
# Nano Banana Pro renderer
python examples/quickstart.py \
--backend nano-banana-pro \
--base-url http://localhost:8000/v1 \
--model GenEvolve \
--prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
--output paris.png
# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)
python examples/quickstart.py \
--backend qwen-image-edit-service \
--service-url http://your-qwen-service:8001 \
--base-url http://localhost:8000/v1 \
--model GenEvolve \
--output paris_qwen.png

The agent's final <answer> is a JSON object:

json

{
"gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
"reference_images": [
{"img_id": "IMG_001", "note": "what to copy from this image"}
]
}

gen_prompt MUST refer to selected images using ordinal phrases ("the first reference image") โ€” never raw IMG_### ids or URLs. Pass (gen_prompt, [r["local_path"] for r in reference_images]) to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.


๐Ÿ—‚๏ธ Related Artifacts


โš–๏ธ Intended Use, Limits, Bias

  • Intended use. Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
  • Search dependency. The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
  • Bias. Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.

๐Ÿ“‘ Citation

bibtex

@misc{chen2026genevolveselfevolvingimagegeneration,
title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation},
author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
year={2026},
eprint={2605.21605},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.21605},
}

Model provider

MeiGen-AI

MeiGen-AI

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today