Aarya2004
minicpmv-trade-lora
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why this exists
No public dataset of trade/contractor invoices with structured line-item annotations
exists (we searched HF Hub, academic benchmarks, and Kaggle exhaustively — the closest
public option, naver-clova-ix/cord-v2, is Indonesian restaurant receipts). So we
built a grounded-synthetic corpus: a curated catalog of ~381 real trade
parts/services with real price bands (sourced from public retail listings, contractor
flat-rate templates, and cost guides), assembled into coherent jobs with code-owned
arithmetic and rendered through 8 distinct invoice templates. The generator and catalog
live in the Quillwright repo (finetune/synth/).
Results — baseline vs. tuned (held-out synthetic test split, n=50)
Held-out test invoices are generated with a different random seed than training, so the model never saw these exact jobs. Deterministic greedy decoding; 0 generation failures.
| Metric | Baseline (un-tuned) | Tuned (this LoRA) | Δ |
|---|---|---|---|
| Item F1 | 0.703 | 0.933 | +0.230 |
| Quantity accuracy | 0.840 | 1.000 | +0.160 |
| Price accuracy | 0.643 | 1.000 | +0.357 |
| Precision | 0.700 | 0.933 | +0.233 |
| Recall | 0.707 | 0.933 | +0.226 |
⚠️ Honest scope: this is an IN-DISTRIBUTION result
The test split shares the same templates, catalog, and price-formatting as training (only the job combinations differ). So this measures how well the model learns the generator's trade-document structure — not transfer to real, photographed trade invoices. The perfect qty/price (1.000) and the very low final training loss are consistent with strong in-distribution fit. Treat +0.23 as "learns trade line-item extraction on clean, consistently-formatted invoices," not "0.93 on a phone photo of a real contractor's bill." Validating against a real-invoice held-out set is documented future work (the generator supports an Augraphy "scanned/photographed" degradation mode, off by default here).
A companion adapter trained on the public CORD benchmark (real receipt photos) shows a more conservative +0.09 item-F1 — the honest real-world-noise data point.
Training
- Base:
openbmb/MiniCPM-V-2_6(8B vision-language model) - Data: 1,000 grounded-synthetic trade invoices (clean WeasyPrint renders); held-out
50-invoice test split (different seed). Catalog + generator: Quillwright
finetune/synth/. - Recipe: OpenBMB official
finetune.py+CPMTrainer, single GPU (L40S), no DeepSpeed, bf16 LoRA. LoRA on the LLM self-attention projections (q/k/v/o) only; vision tower + resampler frozen. r=64, α=64, dropout=0.05, lr=1e-5, model_max_length=2048, 3 epochs.
Inference
python
from peft import PeftModelfrom transformers import AutoModel, AutoTokenizerfrom PIL import ImagePROMPT = ('Extract the line items from this receipt as JSON with this exact shape: ''{"menu": [{"nm": <item name>, "cnt": <quantity>, "price": <price>}], ''"total": <grand total>}. Output only the JSON.')base = AutoModel.from_pretrained("openbmb/MiniCPM-V-2_6", trust_remote_code=True,attn_implementation="sdpa")model = PeftModel.from_pretrained(base, "Aarya2004/minicpmv-trade-lora",trust_remote_code=True).eval().cuda()tok = AutoTokenizer.from_pretrained("openbmb/MiniCPM-V-2_6", trust_remote_code=True)img = Image.open("invoice.jpg").convert("RGB")msgs = [{"role": "user", "content": [img, PROMPT]}]print(model.chat(image=None, msgs=msgs, tokenizer=tok, sampling=False))
MiniCPM-V-2_6's remote code hard-imports
flash_attneven with SDPA; if you hit that ImportError, stripflash_attnfromtransformers.dynamic_module_utils.get_imports(seefinetune/flash_patch.pyin the Quillwright repo) — flash-attn is not required.
Attribution
- Base model: MiniCPM-V-2_6 © OpenBMB.
- Training data: synthetic, generated by the Quillwright pipeline from a catalog grounded in public pricing data (own license). Fine-tune recipe adapted from OpenBMB's official MiniCPM-V finetune scripts (Apache-2.0).
Model provider
Aarya2004
Model tree
Base
openbmb/MiniCPM-V-2_6
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information