DnaRnaProteins/qwen2.5-vl-7b-microscopy-vqa-lora API & Inference Endpoint

Tasks

CHAMMI (single-cell fluorescence): which subcellular structure is labeled (Golgi / Microtubules / Mitochondria / Nuclear speckles).
LIVECell (phase-contrast): which cell line (A172 / BT474 / BV-2 / Huh7 / MCF7 / SH-SY5Y / SkBr3 / SK-OV-3).

Results (base zero-shot vs this LoRA)

In-domain held-out accuracy:

Table with columns: task, base, fine-tuned
task	base	fine-tuned
LIVECell cell-line (8-way)	0.140	0.350
CHAMMI organelle (4-way)	0.310	0.490

Choice-order robustness (k=4 option orders) and POPE-style hallucination probe:

Table with columns: metric, base, fine-tuned
metric	base	fine-tuned
robustness consistency	0.173	0.580
robust accuracy	0.242	0.452
POPE F1	0.123	0.314
POPE hallucination rate	0.068	0.075

Generalization to the held-out MicroVQA expert-reasoning benchmark (NOT trained on): overall 0.507 (base) vs 0.512 (fine-tuned) -- the narrow classification fine-tune sharpens the in-domain tasks and improves robustness/grounding, without transferring to (or harming) broad microscopy reasoning.

Usage

python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
base = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, "DnaRnaProteins/qwen2.5-vl-7b-microscopy-vqa-lora")
processor = AutoProcessor.from_pretrained("DnaRnaProteins/qwen2.5-vl-7b-microscopy-vqa-lora")

Training

LoRA (r=16, alpha=32) on the LLM projections (q/k/v/o, gate/up/down); vision tower frozen. bf16, gradient checkpointing, batch 1 + grad-accum 8, 4000 steps, images capped at 384x384. ~7.3k MCQ examples (4k CHAMMI class-balanced + 3.3k LIVECell). MicroVQA kept fully held out.

Limitations

Specialized to these two microscopy datasets and the MCQ format; not a general microscopy assistant. Cell-line identification from phase-contrast is inherently hard. Evaluate before any research use; not for clinical/diagnostic use.

Tasks

CHAMMI (single-cell fluorescence): which subcellular structure is labeled (Golgi / Microtubules / Mitochondria / Nuclear speckles).
LIVECell (phase-contrast): which cell line (A172 / BT474 / BV-2 / Huh7 / MCF7 / SH-SY5Y / SkBr3 / SK-OV-3).

Results (base zero-shot vs this LoRA)

In-domain held-out accuracy:

Table with columns: task, base, fine-tuned
task	base	fine-tuned
LIVECell cell-line (8-way)	0.140	0.350
CHAMMI organelle (4-way)	0.310	0.490

Choice-order robustness (k=4 option orders) and POPE-style hallucination probe:

Table with columns: metric, base, fine-tuned
metric	base	fine-tuned
robustness consistency	0.173	0.580
robust accuracy	0.242	0.452
POPE F1	0.123	0.314
POPE hallucination rate	0.068	0.075

Usage

python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
base = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, "DnaRnaProteins/qwen2.5-vl-7b-microscopy-vqa-lora")
processor = AutoProcessor.from_pretrained("DnaRnaProteins/qwen2.5-vl-7b-microscopy-vqa-lora")

qwen2.5-vl-7b-microscopy-vqa-lora

Get help setting up a custom Dedicated Endpoints.

README

Tasks

Results (base zero-shot vs this LoRA)

Usage

Training

Limitations

Explore FriendliAI today

README

Tasks

Results (base zero-shot vs this LoRA)

Usage

Training

Limitations