Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model details
Two LoRA checkpoints are provided:
| File | Use | Rank | Steps | Resolution |
|---|---|---|---|---|
flux_packaging_lora_r16_res1024_steps2000.safetensors | Primary — used for the SDXL-vs-FLUX comparison in the dissertation | 16 | 2000 | 1024 × 1024 |
flux_packaging_lora_r16_res512_steps1000.safetensors | Supplementary — produced as a robustness check during infrastructure resolution | 16 | 1000 | 512 × 512 |
Shared training configuration:
| Property | Value |
|---|---|
| Base model | black-forest-labs/FLUX.1-schnell |
| Learning rate | 5e-5 |
| Trigger token | ipsnackpkg |
| Precision | bfloat16 |
| Training hardware | NVIDIA A100 (40 GB) on Google Colab Pro |
| Wall-clock training time (primary) | ≈ 3 h 40 min |
The FLUX learning rate (5e-5) is lower than the SDXL counterpart (1e-4) to account for FLUX's greater sensitivity to gradient magnitude.
Pinned dependency configuration
FLUX LoRA training in the diffusers ecosystem required pinning a specific dependency set due to incompatibilities on the diffusers main branch:
diffusers==0.32.0
transformers==4.45.2
peft==0.13.2
accelerate==1.1.1
Reproducing training requires this pinned set; see the dissertation methodology log for full context.
Training data
311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence). Identical training corpus to the SDXL counterpart LoRA. Per-image provenance is preserved in the code repository as data/packaging_metadata.csv.
Intended use
Research use in studying base-model contribution to packaging-domain image generation. The dissertation's RQ1 asks whether fine-tuned FLUX produces superior packaging generation compared to fine-tuned SDXL under comparable LoRA configurations. This model is the FLUX side of that comparison.
How to use
python
from diffusers import FluxPipelineimport torchpipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell",torch_dtype=torch.bfloat16,).to("cuda")pipe.load_lora_weights("Vclord/flux-packaging-lora-indian-snacks",weight_name="flux_packaging_lora_r16_res1024_steps2000.safetensors",)pipe.set_adapters(["default_0"], adapter_weights=[0.5])prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet"image = pipe(prompt,num_inference_steps=4,guidance_scale=0.0,width=1024,height=1024,max_sequence_length=256,).images[0]image.save("output.png")
Recommended LoRA scale: 0.5
Why 0.5 and not 1.0?
Unlike SDXL LoRAs which are conventionally used at scale 1.0, this FLUX LoRA operates best at scale 0.5. A diagnostic comparison at scales 0.3, 0.5, and 1.0 confirmed that scale 1.0 over-asserts on FLUX outputs, producing hazy ghosted packets — a known phenomenon in the FLUX LoRA community. Scale 0.5 preserves the trained LoRA contribution without inducing the over-assertion failure mode.
Evaluation
The FLUX vs SDXL comparison was conducted as a LoRA-only experiment (no IP-Adapter, no ControlNet) because mature FLUX equivalents of those components were not available at the time of writing. The comparison therefore answers a narrower sub-question of RQ1: whether FLUX is a better base model for the packaging-domain LoRA task in isolation.
Quantitative metrics across 24 comparison images (3 prompts × 2 seeds × 4 conditions):
| Configuration | CLIP-img | CLIP-txt | LPIPS |
|---|---|---|---|
| SDXL baseline (no LoRA) | — | — | — |
| SDXL + LoRA + Plus + ControlNet (full pipeline) | 0.552 | 0.320 | 0.782 |
| FLUX baseline (no LoRA) | 0.475 | 0.255 | 0.795 |
| FLUX + LoRA at scale 0.5 | 0.528 | 0.306 | 0.665 |
Intra-rater reliability for the FLUX comparison spike (n = 24), Cohen's weighted kappa with linear weights:
| Axis | κ |
|---|---|
| Text legibility | 0.740 |
| Packaging plausibility | 0.559 |
| Visual quality | 0.554 |
(Regional appropriateness was not scored for this spike because the FLUX comparison prompts were not folk-art conditioned.)
Headline finding: FLUX + LoRA at scale 0.5 achieves the lowest LPIPS distance to real packaging across all configurations tested, suggesting base-model choice contributes more to packaging-domain quality than the specific fine-tuning strategy. This finding is bounded by the LoRA-only comparison scope; the full-pipeline comparison is future work.
Limitations
- LoRA-only configuration; no IP-Adapter or ControlNet conditioning is applied during inference with this model. Folk-art style transfer is not part of the FLUX pipeline at the time of writing.
- Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
- The 1024-resolution LoRA is the primary deliverable; the 512-resolution LoRA was produced during infrastructure resolution and behaves similarly at scale 0.5 but is not the main artefact
- Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion
Citation
If you use this LoRA in research, please cite:
bibtex
@mastersthesis{chandra2026folkart,title = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},author = {Chandra, Vivek},year = {2026},school = {University of Stirling},type = {MSc Dissertation, Artificial Intelligence}}
Companion repository and SDXL counterpart
- Full code: https://github.com/Vclord/folk-art-packaging-generation
- SDXL counterpart LoRA: https://huggingface.co/Vclord/sdxl-packaging-lora-indian-snacks
Licence
apache-2.0
Model provider
Vclord
Model tree
Base
black-forest-labs/FLUX.1-schnell
Adapter
this model
Modalities
Input
Text
Output
Image
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information