Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherEvaluation
Evaluation was run on the TR-DocVQA-Synth test split with 2,000 examples.
| Model | Setting | Test Samples | Normalized EM | ANLS | Token F1 | Empty Prediction Rate | Invalid Prediction Rate |
|---|---|---|---|---|---|---|---|
| PaliGemma-3B LoRA | Fine-tuned LoRA | 2000 | 0.7205 | 0.8745 | 0.7294 | 0.0000 | 0.0000 |
Additional paper-ready metrics, per-field breakdowns, error analysis, and LaTeX tables are included under evaluation/.
Usage
python
import torchfrom PIL import Imagefrom transformers import AutoProcessor, PaliGemmaForConditionalGenerationfrom peft import PeftModelbase_model = "google/paligemma-3b-pt-224"adapter_id = "omerfaksoy/trdocvqa-paligemma-3b-lora"processor = AutoProcessor.from_pretrained(adapter_id)model = PaliGemmaForConditionalGeneration.from_pretrained(base_model,torch_dtype=torch.bfloat16,device_map="auto",)model = PeftModel.from_pretrained(model, adapter_id)model.eval()image = Image.open("document.png").convert("RGB")question = "Toplam tutar nedir?"prompt = f"answer tr {question}\n"inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)with torch.inference_mode():generated = model.generate(**inputs, max_new_tokens=64, do_sample=False, num_beams=1)prompt_len = inputs["input_ids"].shape[-1]answer = processor.batch_decode(generated[:, prompt_len:], skip_special_tokens=True)[0].strip()print(answer)
Training Summary
- Method: LoRA fine-tuning
- Base model:
google/paligemma-3b-pt-224 - Dataset:
Ethosoft/TR-DocVQA-Synth - Language: Turkish
- Input: document image + Turkish question
- Output: short answer text
- Hardware used: TRUBA Kolyoz H200
Important Notes
This repository contains a LoRA adapter, not a full merged copy of the base PaliGemma model. Users must comply with the terms of the base model and accept any gated access requirements for google/paligemma-3b-pt-224.
The model was developed for research use on synthetic Turkish document VQA data. Before production use, evaluate on real documents from the target domain and review privacy, licensing, and bias considerations.
Model provider
Ethosoft
Model tree
Base
google/paligemma-3b-pt-224
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information