Tasks
- CHAMMI (single-cell fluorescence): which subcellular structure is labeled
(Golgi / Microtubules / Mitochondria / Nuclear speckles).
- LIVECell (phase-contrast): which cell line (A172 / BT474 / BV-2 / Huh7 / MCF7 /
SH-SY5Y / SkBr3 / SK-OV-3).
Results (base zero-shot vs this LoRA)
In-domain held-out accuracy:
Table with columns: task, base, fine-tuned| task | base | fine-tuned |
|---|
| LIVECell cell-line (8-way) | 0.140 | 0.350 |
| CHAMMI organelle (4-way) | 0.310 | 0.490 |
Choice-order robustness (k=4 option orders) and POPE-style hallucination probe:
Table with columns: metric, base, fine-tuned| metric | base | fine-tuned |
|---|
| robustness consistency | 0.173 | 0.580 |
| robust accuracy | 0.242 | 0.452 |
| POPE F1 | 0.123 | 0.314 |
| POPE hallucination rate | 0.068 | 0.075 |
Generalization to the held-out MicroVQA expert-reasoning benchmark (NOT trained on):
overall 0.507 (base) vs 0.512 (fine-tuned) -- the
narrow classification fine-tune sharpens the in-domain tasks and improves robustness/grounding,
without transferring to (or harming) broad microscopy reasoning.
Usage
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
base = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, "DnaRnaProteins/qwen2.5-vl-7b-microscopy-vqa-lora")
processor = AutoProcessor.from_pretrained("DnaRnaProteins/qwen2.5-vl-7b-microscopy-vqa-lora")
Training
LoRA (r=16, alpha=32) on the LLM projections (q/k/v/o, gate/up/down); vision tower frozen.
bf16, gradient checkpointing, batch 1 + grad-accum 8, 4000 steps, images capped at 384x384.
~7.3k MCQ examples (4k CHAMMI class-balanced + 3.3k LIVECell). MicroVQA kept fully held out.
Limitations
Specialized to these two microscopy datasets and the MCQ format; not a general microscopy
assistant. Cell-line identification from phase-contrast is inherently hard. Evaluate before
any research use; not for clinical/diagnostic use.