prithivMLmods
CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Key Highlights
- BLIP3o Long-Caption Distillation: Trained to generate highly descriptive, structured, and context-rich captions.
- Cap-Optimized Architecture: Fine-tuned specifically for long-form captioning and multimodal descriptive tasks.
- Abliterated rMAX Base: Built on an aggressively abliterated backbone to minimize refusal behaviors and maximize response openness.
- 27B Parameter Model: Leverages the full capability of Qwen3.6-27B for strong reasoning and generation quality.
- Instruction + Caption Fusion: Handles both instruction-following and detailed caption generation seamlessly.
- High-Coherence Outputs: Maintains consistency across long generations with improved contextual grounding.
Base Model Signatures:
This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3.6-27B-abliterated.
Datasets Used
The model is trained on a curated mixture of long-caption and optimization datasets:
-
Caption Datasets
prithivMLmods/Caption3o-LongCap-v4prithivMLmods/Caption3o-XL-v4prithivMLmods/Caption3o-Opt-v3prithivMLmods/Caption3o-Opt-v3-Tiny
-
Alignment / Evaluation Dataset
prithivMLmods/harm_bench
These datasets collectively enhance long-form caption quality, structural richness, and robustness under diverse prompts.
Model Architecture
- Base Model:
Qwen/Qwen3.6-27B - Derived From:
prithivMLmods/Qwen3.6-27B-abliterated-rMAX - Model Type: BLIP3o Long-Caption Distilled
- Parameter Count: 27 Billion
Quick Start with Transformers
bash
pip install transformers==5.4.0# or latestpip install git+https://github.com/huggingface/transformers.git
python
from transformers import Qwen3_5ForConditionalGeneration, AutoProcessorimport torchmodel = Qwen3_5ForConditionalGeneration.from_pretrained("prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled",torch_dtype="auto",device_map="auto")processor = AutoProcessor.from_pretrained("prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled")messages = [{"role": "user","content": [{"type": "text", "text": "Generate a highly detailed caption of a futuristic city skyline at sunset."}],}]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = processor(text=[text],padding=True,return_tensors="pt").to("cuda")generated_ids = model.generate(**inputs, max_new_tokens=512)generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]output_text = processor.batch_decode(generated_ids_trimmed,skip_special_tokens=True,clean_up_tokenization_spaces=False)print(output_text)
Intended Use
- Long Caption Generation: High-quality descriptive captions for images and multimodal inputs
- Multimodal Research: Studying captioning systems and vision-language alignment
- Instruction + Caption Tasks: Hybrid prompts requiring reasoning + description
- Red-Teaming & Alignment Research: Evaluating reduced-refusal systems
- Local High-Performance Deployment: Multi-GPU or quantized inference setups
Limitations & Risks
Important Note: This model intentionally minimizes built-in safety refusals.
- Sensitive Content Risk: May produce unrestricted or controversial outputs
- User Responsibility: Requires careful and ethical usage
- High Compute Demand: 27B models need significant VRAM or optimized inference
- Abliteration Trade-offs: Reduced refusal may impact safety alignment and output filtering
Model provider
prithivMLmods
Model tree
Base
prithivMLmods/Qwen3.6-27B-abliterated-rMAX
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information