prithivMLmods

CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled

README

License: apache-2.0

Key Highlights

BLIP3o Long-Caption Distillation: Trained to generate highly descriptive, structured, and context-rich captions.
Cap-Optimized Architecture: Fine-tuned specifically for long-form captioning and multimodal descriptive tasks.
Abliterated rMAX Base: Built on an aggressively abliterated backbone to minimize refusal behaviors and maximize response openness.
27B Parameter Model: Leverages the full capability of Qwen3.6-27B for strong reasoning and generation quality.
Instruction + Caption Fusion: Handles both instruction-following and detailed caption generation seamlessly.
High-Coherence Outputs: Maintains consistency across long generations with improved contextual grounding.

Base Model Signatures:

This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3.6-27B-abliterated.

Datasets Used

The model is trained on a curated mixture of long-caption and optimization datasets:

Caption Datasets
- prithivMLmods/Caption3o-LongCap-v4
- prithivMLmods/Caption3o-XL-v4
- prithivMLmods/Caption3o-Opt-v3
- prithivMLmods/Caption3o-Opt-v3-Tiny
Alignment / Evaluation Dataset
- prithivMLmods/harm_bench

These datasets collectively enhance long-form caption quality, structural richness, and robustness under diverse prompts.

Model Architecture

Base Model: Qwen/Qwen3.6-27B
Derived From: prithivMLmods/Qwen3.6-27B-abliterated-rMAX
Model Type: BLIP3o Long-Caption Distilled
Parameter Count: 27 Billion

Quick Start with Transformers

bash
pip install transformers==5.4.0
# or latest
pip install git+https://github.com/huggingface/transformers.git

python
from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor
import torch

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled"
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Generate a highly detailed caption of a futuristic city skyline at sunset."}
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = processor(
    text=[text],
    padding=True,
    return_tensors="pt"
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=512)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

Long Caption Generation: High-quality descriptive captions for images and multimodal inputs
Multimodal Research: Studying captioning systems and vision-language alignment
Instruction + Caption Tasks: Hybrid prompts requiring reasoning + description
Red-Teaming & Alignment Research: Evaluating reduced-refusal systems
Local High-Performance Deployment: Multi-GPU or quantized inference setups

Limitations & Risks

Important Note: This model intentionally minimizes built-in safety refusals.

Sensitive Content Risk: May produce unrestricted or controversial outputs
User Responsibility: Requires careful and ethical usage
High Compute Demand: 27B models need significant VRAM or optimized inference
Abliteration Trade-offs: Reduced refusal may impact safety alignment and output filtering

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

prithivMLmods

Model Tree

Base

prithivMLmods/Qwen3.6-27B-abliterated-rMAX

Fine-tuned

this model

Input Modalities

TextImageVideo

Output Modalities

Text

Supported Functionality