Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Use it
python
from transformers import AutoModelForImageTextToText, AutoProcessorfrom PIL import Imageckpt = "cfcamo/cfcamo-rl-full" # or a local pathprocessor = AutoProcessor.from_pretrained(ckpt)model = AutoModelForImageTextToText.from_pretrained(ckpt, torch_dtype="auto", device_map="auto",).eval()SYS = ("You are a camouflaged object detector. Output in this exact format:\n\n""<think>your reasoning here</think>\n""followed by ONE of:\n"" - <bbox>[x1,y1,x2,y2]</bbox> for a single camouflaged object\n"" - <bbox>[[x1,y1,x2,y2],[x3,y3,x4,y4]]</bbox> for multiple objects\n"" - <no_camouflage/> if no camouflaged object is present\n\n""Coordinates are normalized to [0, 1000] where 1000 = full image dimension.")USR = ("Identify and locate any camouflaged object in the image.\n\n""In <think></think>, briefly consider scene textures, visual anomalies, ""and if any object blends in. Then output ONE of:\n""- <bbox>[x1,y1,x2,y2]</bbox> for one object, or [[x1,y1,x2,y2],...] for multiple\n""- <no_camouflage/> if no camouflaged object")image = Image.open("path/to/image.jpg").convert("RGB")messages = [{"role": "system", "content": SYS},{"role": "user", "content": [{"type": "image", "image": image},{"type": "text", "text": USR},]},]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True,return_dict=True, return_tensors="pt",).to(model.device)out = model.generate(**inputs, max_new_tokens=512, do_sample=False)print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:],skip_special_tokens=True)[0])
Output is one of:
markdown
<think>...</think><bbox>[x1,y1,x2,y2]</bbox> # single box<think>...</think><bbox>[[...],[...]]</bbox> # multi-box<think>...</think><no_camouflage/> # abstain
Box coordinates are in [0, 1000] normalized image space.
Reproduce paper numbers
bash
git clone https://github.com/suhang2000/CFCamo && cd CFCamopip install -e ".[eval]"huggingface-cli download --repo-type dataset cfcamo/CF-COD --local-dir data/cfcod# (place upstream COD into data/cfcod/<source>/{Imgs,GT}/ — see dataset card)huggingface-cli download cfcamo/cfcamo-rl-full --local-dir checkpoints/cfcamo-rl-fullpython scripts/eval/eval_cfcod.py \--cf-manifest data/cfcod/test/cf_manifest_test.jsonl \--data-root data/cfcod \--models "CFCamo=checkpoints/cfcamo-rl-full" \--out-dir results/cfcod_eval
Training summary
- Base: Qwen/Qwen3-VL-4B-Instruct
- Cold-start SFT: 1000 paired rows (500 detect + 500 abstain) at lr 2e-5
- RL: CSPO/CPR on 4040 paired images, 4×A800 full fine-tuning, checkpoint at step 126 (ε=0.5)
Citation
bibtex
@article{li2026cfcamo,title = {{CFCamo}: A Counterfactual Detect-or-Abstain Framework for Camouflaged Object Detection},author = {Li, Suhang and Yoshie, Osamu and Ieiri, Yuya},journal = {arXiv preprint arXiv:2606.11231},year = {2026}}
Model provider
cfcamo
Model tree
Base
Qwen/Qwen3-VL-4B-Instruct
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information