EasonFan/aircop-8b API & Inference Endpoint

Task

Each question shows the same scene captured at the same moment by 2–6 UAV cameras from different viewpoints, and asks a 4-way multiple-choice question (object grounding, counting, matching, causal/collaboration assessment, etc.). The model answers with a single option letter.

Results (AirCopBench test, 1025 questions)

Subset	Accuracy
Overall	0.7532 (772/1025)
Real2 (2 real UAVs)	0.6099
Sim3 (3 sim UAVs)	0.8206
Sim5 (5 sim UAVs)	0.7415
Sim6 (6 sim UAVs)	0.7405

Parse failures: 0.

Training

Method: LoRA SFT (rank 16, lora_target: all), 1 epoch, bf16, flash-attn 2
Effective batch size 16 (per-device 8 × grad-accum 2), lr 1e-4 cosine, image_max_pixels 262144
Framework: LLaMA-Factory, template qwen3_vl_nothink
~12.7k multi-image samples (Real2 / Sim3 / Sim5 / Sim6)

Usage

python
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel

base = "Qwen/Qwen3-VL-8B-Instruct"
model = AutoModelForImageTextToText.from_pretrained(base, dtype=torch.bfloat16, device_map="cuda")
model = PeftModel.from_pretrained(model, "EasonFan/aircop-8b")
processor = AutoProcessor.from_pretrained(base)

messages = [{"role": "user", "content": [
    {"type": "text", "text": "UAV1:"}, {"type": "image"},
    {"type": "text", "text": "UAV2:"}, {"type": "image"},
    {"type": "text", "text": "Question: ...\nOptions:\nA. ...\nB. ...\nC. ...\nD. ...\nAnswer with only the letter."},
]}]
# build inputs with processor.apply_chat_template + processor(...) and call model.generate()

aircop-8b

Get help setting up a custom Dedicated Endpoints.

README

Task

Results (AirCopBench test, 1025 questions)

Training

Usage

Explore FriendliAI today

aircop-8b