Vclord

flux-packaging-lora-indian-snacks

README

License: apache-2.0

Model details

Two LoRA checkpoints are provided:

Table with columns: File, Use, Rank, Steps, Resolution
File	Use	Rank	Steps	Resolution
`flux_packaging_lora_r16_res1024_steps2000.safetensors`	Primary — used for the SDXL-vs-FLUX comparison in the dissertation	16	2000	1024 × 1024
`flux_packaging_lora_r16_res512_steps1000.safetensors`	Supplementary — produced as a robustness check during infrastructure resolution	16	1000	512 × 512

Shared training configuration:

Table with columns: Property, Value
Property	Value
Base model	`black-forest-labs/FLUX.1-schnell`
Learning rate	5e-5
Trigger token	`ipsnackpkg`
Precision	bfloat16
Training hardware	NVIDIA A100 (40 GB) on Google Colab Pro
Wall-clock training time (primary)	≈ 3 h 40 min

The FLUX learning rate (5e-5) is lower than the SDXL counterpart (1e-4) to account for FLUX's greater sensitivity to gradient magnitude.

Pinned dependency configuration

FLUX LoRA training in the diffusers ecosystem required pinning a specific dependency set due to incompatibilities on the diffusers main branch:

diffusers==0.32.0

transformers==4.45.2

peft==0.13.2

accelerate==1.1.1

Reproducing training requires this pinned set; see the dissertation methodology log for full context.

Training data

311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence). Identical training corpus to the SDXL counterpart LoRA. Per-image provenance is preserved in the code repository as data/packaging_metadata.csv.

Intended use

Research use in studying base-model contribution to packaging-domain image generation. The dissertation's RQ1 asks whether fine-tuned FLUX produces superior packaging generation compared to fine-tuned SDXL under comparable LoRA configurations. This model is the FLUX side of that comparison.

How to use

python
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16,
).to("cuda")

pipe.load_lora_weights(
    "Vclord/flux-packaging-lora-indian-snacks",
    weight_name="flux_packaging_lora_r16_res1024_steps2000.safetensors",
)
pipe.set_adapters(["default_0"], adapter_weights=[0.5])

prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet"
image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0.0,
    width=1024,
    height=1024,
    max_sequence_length=256,
).images[0]
image.save("output.png")

Recommended LoRA scale: 0.5

Why 0.5 and not 1.0?

Unlike SDXL LoRAs which are conventionally used at scale 1.0, this FLUX LoRA operates best at scale 0.5. A diagnostic comparison at scales 0.3, 0.5, and 1.0 confirmed that scale 1.0 over-asserts on FLUX outputs, producing hazy ghosted packets — a known phenomenon in the FLUX LoRA community. Scale 0.5 preserves the trained LoRA contribution without inducing the over-assertion failure mode.

Evaluation

The FLUX vs SDXL comparison was conducted as a LoRA-only experiment (no IP-Adapter, no ControlNet) because mature FLUX equivalents of those components were not available at the time of writing. The comparison therefore answers a narrower sub-question of RQ1: whether FLUX is a better base model for the packaging-domain LoRA task in isolation.

Quantitative metrics across 24 comparison images (3 prompts × 2 seeds × 4 conditions):

Table with columns: Configuration, CLIP-img, CLIP-txt, LPIPS
Configuration	CLIP-img	CLIP-txt	LPIPS
SDXL baseline (no LoRA)	—	—	—
SDXL + LoRA + Plus + ControlNet (full pipeline)	0.552	0.320	0.782
FLUX baseline (no LoRA)	0.475	0.255	0.795
FLUX + LoRA at scale 0.5	0.528	0.306

Intra-rater reliability for the FLUX comparison spike (n = 24), Cohen's weighted kappa with linear weights:

Table with columns: Axis, κ
Axis	κ
Text legibility	0.740
Packaging plausibility	0.559
Visual quality	0.554

(Regional appropriateness was not scored for this spike because the FLUX comparison prompts were not folk-art conditioned.)

Headline finding: FLUX + LoRA at scale 0.5 achieves the lowest LPIPS distance to real packaging across all configurations tested, suggesting base-model choice contributes more to packaging-domain quality than the specific fine-tuning strategy. This finding is bounded by the LoRA-only comparison scope; the full-pipeline comparison is future work.

Limitations

LoRA-only configuration; no IP-Adapter or ControlNet conditioning is applied during inference with this model. Folk-art style transfer is not part of the FLUX pipeline at the time of writing.
Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
The 1024-resolution LoRA is the primary deliverable; the 512-resolution LoRA was produced during infrastructure resolution and behaves similarly at scale 0.5 but is not the main artefact
Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion

Citation

If you use this LoRA in research, please cite:

bibtex
@mastersthesis{chandra2026folkart,
  title  = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},
  author = {Chandra, Vivek},
  year   = {2026},
  school = {University of Stirling},
  type   = {MSc Dissertation, Artificial Intelligence}
}

Companion repository and SDXL counterpart

Full code: https://github.com/Vclord/folk-art-packaging-generation
SDXL counterpart LoRA: https://huggingface.co/Vclord/sdxl-packaging-lora-indian-snacks

Licence

apache-2.0

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Vclord

Model Tree

Base

black-forest-labs/FLUX.1-schnell

Adapter

this model

Input Modalities

Text

Output Modalities

Image

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer