Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model details

Two LoRA checkpoints are provided:

FileUseRankStepsResolution
flux_packaging_lora_r16_res1024_steps2000.safetensorsPrimary — used for the SDXL-vs-FLUX comparison in the dissertation1620001024 × 1024
flux_packaging_lora_r16_res512_steps1000.safetensorsSupplementary — produced as a robustness check during infrastructure resolution161000512 × 512

Shared training configuration:

PropertyValue
Base modelblack-forest-labs/FLUX.1-schnell
Learning rate5e-5
Trigger tokenipsnackpkg
Precisionbfloat16
Training hardwareNVIDIA A100 (40 GB) on Google Colab Pro
Wall-clock training time (primary)≈ 3 h 40 min

The FLUX learning rate (5e-5) is lower than the SDXL counterpart (1e-4) to account for FLUX's greater sensitivity to gradient magnitude.

Pinned dependency configuration

FLUX LoRA training in the diffusers ecosystem required pinning a specific dependency set due to incompatibilities on the diffusers main branch:

diffusers==0.32.0

transformers==4.45.2

peft==0.13.2

accelerate==1.1.1

Reproducing training requires this pinned set; see the dissertation methodology log for full context.

Training data

311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence). Identical training corpus to the SDXL counterpart LoRA. Per-image provenance is preserved in the code repository as data/packaging_metadata.csv.

Intended use

Research use in studying base-model contribution to packaging-domain image generation. The dissertation's RQ1 asks whether fine-tuned FLUX produces superior packaging generation compared to fine-tuned SDXL under comparable LoRA configurations. This model is the FLUX side of that comparison.

How to use

python

from diffusers import FluxPipeline
import torch
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16,
).to("cuda")
pipe.load_lora_weights(
"Vclord/flux-packaging-lora-indian-snacks",
weight_name="flux_packaging_lora_r16_res1024_steps2000.safetensors",
)
pipe.set_adapters(["default_0"], adapter_weights=[0.5])
prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet"
image = pipe(
prompt,
num_inference_steps=4,
guidance_scale=0.0,
width=1024,
height=1024,
max_sequence_length=256,
).images[0]
image.save("output.png")

Recommended LoRA scale: 0.5

Why 0.5 and not 1.0?

Unlike SDXL LoRAs which are conventionally used at scale 1.0, this FLUX LoRA operates best at scale 0.5. A diagnostic comparison at scales 0.3, 0.5, and 1.0 confirmed that scale 1.0 over-asserts on FLUX outputs, producing hazy ghosted packets — a known phenomenon in the FLUX LoRA community. Scale 0.5 preserves the trained LoRA contribution without inducing the over-assertion failure mode.

Evaluation

The FLUX vs SDXL comparison was conducted as a LoRA-only experiment (no IP-Adapter, no ControlNet) because mature FLUX equivalents of those components were not available at the time of writing. The comparison therefore answers a narrower sub-question of RQ1: whether FLUX is a better base model for the packaging-domain LoRA task in isolation.

Quantitative metrics across 24 comparison images (3 prompts × 2 seeds × 4 conditions):

ConfigurationCLIP-imgCLIP-txtLPIPS
SDXL baseline (no LoRA)
SDXL + LoRA + Plus + ControlNet (full pipeline)0.5520.3200.782
FLUX baseline (no LoRA)0.4750.2550.795
FLUX + LoRA at scale 0.50.5280.3060.665

Intra-rater reliability for the FLUX comparison spike (n = 24), Cohen's weighted kappa with linear weights:

Axisκ
Text legibility0.740
Packaging plausibility0.559
Visual quality0.554

(Regional appropriateness was not scored for this spike because the FLUX comparison prompts were not folk-art conditioned.)

Headline finding: FLUX + LoRA at scale 0.5 achieves the lowest LPIPS distance to real packaging across all configurations tested, suggesting base-model choice contributes more to packaging-domain quality than the specific fine-tuning strategy. This finding is bounded by the LoRA-only comparison scope; the full-pipeline comparison is future work.

Limitations

  • LoRA-only configuration; no IP-Adapter or ControlNet conditioning is applied during inference with this model. Folk-art style transfer is not part of the FLUX pipeline at the time of writing.
  • Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
  • The 1024-resolution LoRA is the primary deliverable; the 512-resolution LoRA was produced during infrastructure resolution and behaves similarly at scale 0.5 but is not the main artefact
  • Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion

Citation

If you use this LoRA in research, please cite:

bibtex

@mastersthesis{chandra2026folkart,
title = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},
author = {Chandra, Vivek},
year = {2026},
school = {University of Stirling},
type = {MSc Dissertation, Artificial Intelligence}
}

Companion repository and SDXL counterpart

Licence

apache-2.0

Model provider

Vclord

Model tree

Base

black-forest-labs/FLUX.1-schnell

Adapter

this model

Modalities

Input

Text

Output

Image

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today