Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Use it

python

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
ckpt = "cfcamo/cfcamo-rl-full" # or a local path
processor = AutoProcessor.from_pretrained(ckpt)
model = AutoModelForImageTextToText.from_pretrained(
ckpt, torch_dtype="auto", device_map="auto",
).eval()
SYS = (
"You are a camouflaged object detector. Output in this exact format:\n\n"
"<think>your reasoning here</think>\n"
"followed by ONE of:\n"
" - <bbox>[x1,y1,x2,y2]</bbox> for a single camouflaged object\n"
" - <bbox>[[x1,y1,x2,y2],[x3,y3,x4,y4]]</bbox> for multiple objects\n"
" - <no_camouflage/> if no camouflaged object is present\n\n"
"Coordinates are normalized to [0, 1000] where 1000 = full image dimension."
)
USR = (
"Identify and locate any camouflaged object in the image.\n\n"
"In <think></think>, briefly consider scene textures, visual anomalies, "
"and if any object blends in. Then output ONE of:\n"
"- <bbox>[x1,y1,x2,y2]</bbox> for one object, or [[x1,y1,x2,y2],...] for multiple\n"
"- <no_camouflage/> if no camouflaged object"
)
image = Image.open("path/to/image.jpg").convert("RGB")
messages = [
{"role": "system", "content": SYS},
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": USR},
]},
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt",
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:],
skip_special_tokens=True)[0])

Output is one of:

markdown

<think>...</think><bbox>[x1,y1,x2,y2]</bbox> # single box
<think>...</think><bbox>[[...],[...]]</bbox> # multi-box
<think>...</think><no_camouflage/> # abstain

Box coordinates are in [0, 1000] normalized image space.

Reproduce paper numbers

bash

git clone https://github.com/suhang2000/CFCamo && cd CFCamo
pip install -e ".[eval]"
huggingface-cli download --repo-type dataset cfcamo/CF-COD --local-dir data/cfcod
# (place upstream COD into data/cfcod/<source>/{Imgs,GT}/ — see dataset card)
huggingface-cli download cfcamo/cfcamo-rl-full --local-dir checkpoints/cfcamo-rl-full
python scripts/eval/eval_cfcod.py \
--cf-manifest data/cfcod/test/cf_manifest_test.jsonl \
--data-root data/cfcod \
--models "CFCamo=checkpoints/cfcamo-rl-full" \
--out-dir results/cfcod_eval

Training summary

  • Base: Qwen/Qwen3-VL-4B-Instruct
  • Cold-start SFT: 1000 paired rows (500 detect + 500 abstain) at lr 2e-5
  • RL: CSPO/CPR on 4040 paired images, 4×A800 full fine-tuning, checkpoint at step 126 (ε=0.5)

Citation

bibtex

@article{li2026cfcamo,
title = {{CFCamo}: A Counterfactual Detect-or-Abstain Framework for Camouflaged Object Detection},
author = {Li, Suhang and Yoshie, Osamu and Ieiri, Yuya},
journal = {arXiv preprint arXiv:2606.11231},
year = {2026}
}

Model provider

cfcamo

Model tree

Base

Qwen/Qwen3-VL-4B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today