harsh0024

HiDream-O1-Image-Dev-2604

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Models

Table
NameScriptInference StepsHuggingFace Repo
HiDream-O1-Image-Dev-2604inference.py28🤗 HiDream-O1-Image-Dev-2604
Prompt Agent 2604prompt_agent_v2.py🤗 HiDream-ai/Prompt-Refine

Installation

  1. Clone this repository:

bash

git clone https://github.com/HiDream-ai/HiDream-O1-Image.git
cd HiDream-O1-Image
git checkout dev
  1. Install the required dependencies:

bash

pip install -r requirements.txt

Note on flash-attn. We highly recommend installing flash-attn for optimized attention computation. If you do not (or cannot) install flash-attn, you must edit models/pipeline.py line 291 and change "use_flash_attn": True to "use_flash_attn": False — otherwise inference will fail to import the kernel.

Reasoning-Driven Prompt Agent

HiDream-O1-Image ships with a Reasoning-Driven Prompt Agent (prompt_agent_v2.py) that explicitly reasons through layout, subject attributes, physical logic, and text-rendering details, then rewrites a raw user instruction into a self-contained English prompt. Feed its output into inference.py for best results on intricate, reasoning-heavy requests.

The agent talks to an OpenAI-compatible endpoint serving HiDream-ai/Prompt-Refine via vLLM.

Step 1 — Download the refiner weights

bash

huggingface-cli download HiDream-ai/Prompt-Refine \
--local-dir HiDream-ai/Prompt-Refine

Step 2 — Start the vLLM server

bash

bash start_vllm_server.sh

This launches HiDream-ai/Prompt-Refine on http://localhost:8000/v1.

Step 3 — Run the refiner

bash

python prompt_agent_v2.py \
--prompt "A vintage aviation poster featuring a bright red biplane cruising over rolling farmlands. Bold blocky text at the bottom promises adventure in the friendly skies."

By default the script targets http://localhost:8000/v1 and HiDream-ai/Prompt-Refine; override with --base_url or --model_id if you serve the model elsewhere. The same module also exposes a reusable refine_prompt(prompt, model_id=..., base_url=...) function used by app.py.

Usage

A CUDA-capable GPU is required for inference. The examples below use the undistilled model (--model_type full); see the last subsection for running the same tasks with the distilled model (--model_type dev).

1. Text-to-Image Generation

Generate an image from a text prompt:

bash

python inference.py \
--model_path /path/to/HiDream-O1-Image-Dev-2604 \
--prompt "A vintage aviation poster depicting a bright red biplane cruising over rolling farmlands under a partly cloudy sky, with saturated colors and an aged paper texture. A red biplane with two sets of wings and a radial engine is positioned in the upper center of the image, flying toward the right. A pilot with light skin, wearing a brown flight helmet, goggles, and a brown jacket, is visible in the open cockpit. The biplane has black wheels with red hubs and a spinning propeller. Below, the landscape consists of rolling fields in various shades of green, yellow, and brown, divided by dirt roads and scattered with small houses, including a red barn, a brown house, and a white house. In the background, a line of green trees separates the fields from distant hills under a blue sky with white clouds. The poster has a textured, aged paper border with visible creases and discoloration. At the bottom, the text \"ADVENTURE IN THE FRIENDLY SKIES\" is displayed in large, bold, dark brown capital letters across two lines on a light beige background." \
--output_image results/t2i.png \
--height 2048 \
--width 2048

License

The code in this repository and the HiDream-O1-Image-Dev-2604 models are licensed under MIT License.

Citation

bibtex

@article{hidreamolimage,
title={HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer},
author={Cai, Qi and Chen, Jingwen and Gao, Chengmin and Gong, Zijian and Li, Yehao and Mei, Tao and Pan, Yingwei and Peng, Yi and Qiu, Zhaofan and Yao, Ting and Yu, Kai and Zhang, Yiheng and others},
journal={arXiv preprint arXiv:2605.11061},
year={2026}
}

Model provider

harsh0024

Model tree

Base

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today