BennyDaBall

Z-Image-Engineer-V6

README

License: apache-2.0

Model Metadata

Table with columns: Key, Value
Key	Value
License	Apache-2.0
Language	English (`en`)
Base Model	`Tongyi-MAI/Z-Image-Turbo`
Library	`transformers`
Pipeline Tag	`text-generation`
Format	HF Safetensors

The Z-Engineer returns, fully rebuilt around the SMART DoRA training system for Z-Image Turbo.

Yes, we jump from V4 to V6. Unlike the usual guy math, this one actually brought the extra two inches.

Z-Image-Engineer V6 is a fine-tuned 4B Qwen text encoder (Tongyi-MAI/Z-Image-Turbo) optimized for dual-role performance: a local prompt-enhancement model and a merged HF text encoder for Z-Image workflows. The ComfyUI-Z-Engineer node runs both roles fully inside ComfyUI from this release.

Z-Image-Engineer V6 simple A/B with rewrites

What is Z-Image-Engineer V6?

V6 transforms minimal seed prompts into rich, highly structured visual narratives. It adds explicit scene composition, lighting direction, material texture, and depth separation while stripping out empty prompt sludge like "8k, masterpiece, trending on ArtStation."

It can also be used directly as a Z-Image text encoder. This repo contains the merged HF safetensors. The GGUF quantized release lives in the companion repo: Z-Image-Engineer-V6-GGUF.

Key Use Cases

Prompt Enhancement: Upgrade simple concepts into descriptive, high-fidelity visual prompts locally.
Text Encoder Swap: Replace the stock Z-Image Qwen text encoder to generate different conditioning from the same seed.
Hybrid Mode: Use V6 to rewrite your prompt, then use V6 again to encode it. It writes the scene and drives the image model.
Private Local Workflow: Built for LM Studio, ComfyUI, and llama.cpp. No API logs, no external telemetry.

Under the Hood: SMART DoRA

V4 pioneered SMART training. V6 adapts that system into a Weight-Decomposed Low-Rank Adaptation (DoRA) framework.

DoRA provides surgical adapter updates by decoupling directional and magnitude adjustments. SMART adds auxiliary pressure so the model does not collapse into repetitive prompt loops or superficial sentence patterns.

Table with columns: Regularizer, What it Does, Why it Matters
Regularizer	What it Does	Why it Matters
Entropic	Broadens output probability diversity.	Reduces repetitive loops and generic vocabulary.
Holographic	Enforces structured, depth-wise feature logic.	Improves foreground/background hierarchy.
Topological	Stabilizes coherent latent trajectories.	Keeps prompts flowing naturally instead of stalling out.
Manifold	Regulates overall weight distributions.	Keeps model behavior stable under high-pressure refinement.

V6 was not a simple one-and-done training run. The final architecture is a blended composite:

Base Pass: Master-corpus SMART DoRA training on the native Z-Image Turbo text encoder.
Retention Pass: Preservation pressure for numbers, color accuracy, text signage, named objects, actions, and spatial tracking.
SceneClean SFT32: Supervised refinement to restore the cinematic V4/base-V6 voice.
AntiRepeat Binary24: Binary anti-repeat refinement to reduce loops, abrupt fragments, and bad endings.
Final Blend: A 25% style-restoration / 75% anti-repeat DoRA adapter blend, balancing vivid descriptions with tighter syntax.

Quick Start

LM Studio: Prompt Enhancement

Use this merged HF release directly where supported, or download a GGUF quant from Z-Image-Engineer-V6-GGUF for LM Studio. No complex system prompt is required.

text
Enhance this image prompt for Z-Image Turbo: a unicorn

The comparison examples were generated from direct LM Studio user requests like this, with no separate system prompt. V6_SYSTEM_PROMPT.md is included only as an optional preset for people who want a stricter prompt-only chat setup.

ComfyUI: Text Encoder + Local Prompt Enhancer

Use the ComfyUI-Z-Engineer custom node (v2.0+). It loads this repo's sharded safetensors release directly and runs V6 as both the Z-Image text encoder and an in-ComfyUI prompt enhancer - no LM Studio or external server required.

Download this repo into ComfyUI/models/text_encoders/Z-Image-Engineer-V6/ (the three model-0000X-of-00003.safetensors shards plus model.safetensors.index.json).
Add Z-Engineer CLIP Loader (Safetensors / Shards) and pick Z-Image-Engineer-V6/ from the dropdown.
Wire clip into your Z-Image CLIP Text Encode - V6 replaces the stock Qwen text encoder.
Optional: add Z-Engineer Prompt Enhancer (Local) with the same clip to rewrite seed prompts in-process; the enhanced prompt is previewed right on the node.

A ready-made workflow ships with the node repo: example_workflows/z_image_turbo_z_engineer.json.

Prefer smaller files? Use a quant from Z-Image-Engineer-V6-GGUF with the node's Z-Engineer CLIP Loader (GGUF) instead.

Verified Image Settings

text
UNET: z_image_turbo_bf16.safetensors
VAE: ae.safetensors
Text Encoder: Z-Image-Engineer-V6 (this repo's sharded safetensors, or a GGUF quant)
Resolution: 1024x1024
Steps: 8
CFG: 1.0
Sampler: res_multistep
Scheduler: simple
Shift: 3.0

Training Specifics

Table with columns: Parameter, Specification
Parameter	Specification
Base Text Encoder	`Tongyi-MAI/Z-Image-Turbo/text_encoder`
Tokenizer	`Tongyi-MAI/Z-Image-Turbo/tokenizer`
Method	SMART DoRA / PEFT Adapter Training
Rank / Alpha / Dropout	64 / 64 / 0.03
Target Modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, , ,

GGUF Quantization Ladder

The quantized release is separate on purpose:

BennyDaBall/Z-Image-Engineer-V6-GGUF

That repo contains the full GGUF ladder: F16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, and MXFP4.

Verification & Proof

The bundled comparison image is:

text
evidence/gallery_z_image_engineer_v6_simple_ab_with_rewrites_CONTACT.png

It compares foundational prompts across four isolated control paths:

Stock Encoder + Raw Prompt
V6 Encoder + Raw Prompt
Stock Encoder + V6 LM Studio Rewrite
V6 Encoder + V6 LM Studio Rewrite

Disclaimer & Acknowledgements

This model is a prompt engineer and text encoder. Diffusion is still diffusion; structural expansion improves compositional adherence, but it does not mathematically guarantee a perfect seed every single time. Use creative judgment locally.

Tongyi-MAI for the Z-Image Turbo ecosystem.
Qwen for the adaptable text encoder backbone.
The open-source maintainers behind LM Studio, ComfyUI, llama.cpp, PEFT, and Transformers.
My local power utility provider, for sustaining the research grid.

Built & trained locally with care by BennyDaBall.

Follow me on X @BennyDaBall_OG !

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

BennyDaBall

Model Tree

Base

Tongyi-MAI/Z-Image-Turbo

Fine-tuned

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Model Metadata

Table with columns: Key, Value
Key	Value
License	Apache-2.0
Language	English (`en`)
Base Model	`Tongyi-MAI/Z-Image-Turbo`
Library	`transformers`
Pipeline Tag	`text-generation`
Format	HF Safetensors

The Z-Engineer returns, fully rebuilt around the SMART DoRA training system for Z-Image Turbo.

Yes, we jump from V4 to V6. Unlike the usual guy math, this one actually brought the extra two inches.

Z-Image-Engineer V6 simple A/B with rewrites

What is Z-Image-Engineer V6?

It can also be used directly as a Z-Image text encoder. This repo contains the merged HF safetensors. The GGUF quantized release lives in the companion repo: Z-Image-Engineer-V6-GGUF.

Key Use Cases

Prompt Enhancement: Upgrade simple concepts into descriptive, high-fidelity visual prompts locally.
Text Encoder Swap: Replace the stock Z-Image Qwen text encoder to generate different conditioning from the same seed.
Hybrid Mode: Use V6 to rewrite your prompt, then use V6 again to encode it. It writes the scene and drives the image model.
Private Local Workflow: Built for LM Studio, ComfyUI, and llama.cpp. No API logs, no external telemetry.

Under the Hood: SMART DoRA

V4 pioneered SMART training. V6 adapts that system into a Weight-Decomposed Low-Rank Adaptation (DoRA) framework.

Table with columns: Regularizer, What it Does, Why it Matters
Regularizer	What it Does	Why it Matters
Entropic	Broadens output probability diversity.	Reduces repetitive loops and generic vocabulary.
Holographic	Enforces structured, depth-wise feature logic.	Improves foreground/background hierarchy.
Topological	Stabilizes coherent latent trajectories.	Keeps prompts flowing naturally instead of stalling out.
Manifold	Regulates overall weight distributions.	Keeps model behavior stable under high-pressure refinement.

V6 was not a simple one-and-done training run. The final architecture is a blended composite:

Base Pass: Master-corpus SMART DoRA training on the native Z-Image Turbo text encoder.
Retention Pass: Preservation pressure for numbers, color accuracy, text signage, named objects, actions, and spatial tracking.
SceneClean SFT32: Supervised refinement to restore the cinematic V4/base-V6 voice.
AntiRepeat Binary24: Binary anti-repeat refinement to reduce loops, abrupt fragments, and bad endings.
Final Blend: A 25% style-restoration / 75% anti-repeat DoRA adapter blend, balancing vivid descriptions with tighter syntax.

Quick Start

LM Studio: Prompt Enhancement

Use this merged HF release directly where supported, or download a GGUF quant from Z-Image-Engineer-V6-GGUF for LM Studio. No complex system prompt is required.

text
Enhance this image prompt for Z-Image Turbo: a unicorn

ComfyUI: Text Encoder + Local Prompt Enhancer

Download this repo into ComfyUI/models/text_encoders/Z-Image-Engineer-V6/ (the three model-0000X-of-00003.safetensors shards plus model.safetensors.index.json).
Add Z-Engineer CLIP Loader (Safetensors / Shards) and pick Z-Image-Engineer-V6/ from the dropdown.
Wire clip into your Z-Image CLIP Text Encode - V6 replaces the stock Qwen text encoder.
Optional: add Z-Engineer Prompt Enhancer (Local) with the same clip to rewrite seed prompts in-process; the enhanced prompt is previewed right on the node.

A ready-made workflow ships with the node repo: example_workflows/z_image_turbo_z_engineer.json.

Prefer smaller files? Use a quant from Z-Image-Engineer-V6-GGUF with the node's Z-Engineer CLIP Loader (GGUF) instead.

Verified Image Settings

text
UNET: z_image_turbo_bf16.safetensors
VAE: ae.safetensors
Text Encoder: Z-Image-Engineer-V6 (this repo's sharded safetensors, or a GGUF quant)
Resolution: 1024x1024
Steps: 8
CFG: 1.0
Sampler: res_multistep
Scheduler: simple
Shift: 3.0

Training Specifics

Table with columns: Parameter, Specification
Parameter	Specification
Base Text Encoder	`Tongyi-MAI/Z-Image-Turbo/text_encoder`
Tokenizer	`Tongyi-MAI/Z-Image-Turbo/tokenizer`
Method	SMART DoRA / PEFT Adapter Training
Rank / Alpha / Dropout	64 / 64 / 0.03
Target Modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, , ,

GGUF Quantization Ladder

The quantized release is separate on purpose:

BennyDaBall/Z-Image-Engineer-V6-GGUF

That repo contains the full GGUF ladder: F16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, and MXFP4.

Verification & Proof

The bundled comparison image is:

text
evidence/gallery_z_image_engineer_v6_simple_ab_with_rewrites_CONTACT.png

It compares foundational prompts across four isolated control paths:

Stock Encoder + Raw Prompt
V6 Encoder + Raw Prompt
Stock Encoder + V6 LM Studio Rewrite
V6 Encoder + V6 LM Studio Rewrite

Disclaimer & Acknowledgements

Tongyi-MAI for the Z-Image Turbo ecosystem.
Qwen for the adaptable text encoder backbone.
The open-source maintainers behind LM Studio, ComfyUI, llama.cpp, PEFT, and Transformers.
My local power utility provider, for sustaining the research grid.

Built & trained locally with care by BennyDaBall.

Follow me on X @BennyDaBall_OG !

Z-Image-Engineer-V6

README

Model Metadata

What is Z-Image-Engineer V6?

Key Use Cases

Under the Hood: SMART DoRA

The Refinement Pipeline

Quick Start

LM Studio: Prompt Enhancement

ComfyUI: Text Encoder + Local Prompt Enhancer

Verified Image Settings

Training Specifics

GGUF Quantization Ladder

Verification & Proof

Disclaimer & Acknowledgements

Explore FriendliAI today

README

Model Metadata

What is Z-Image-Engineer V6?

Key Use Cases

Under the Hood: SMART DoRA

The Refinement Pipeline

Quick Start

LM Studio: Prompt Enhancement

ComfyUI: Text Encoder + Local Prompt Enhancer

Verified Image Settings

Training Specifics

GGUF Quantization Ladder

Verification & Proof

Disclaimer & Acknowledgements