Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
- Model name: Mosslight 4B
- Model ID:
ttrpg/mosslight-4b - Base model:
Qwen/Qwen3.5-4B - Derivative type: fine-tuned and merged full-weight release
- Architecture:
Qwen3_5ForConditionalGeneration - Model type: vision-language causal generation
- Parameters: approximately 4B
- Native context length: 262,144 tokens, as inherited from the base config
- License: Apache 2.0, inherited from the base model
Lineage
This model is a fine-tuned, merged derivative of Qwen3.5-4B from Alibaba
Cloud/Qwen. The original Apache 2.0 license is preserved in LICENSE, and
derivative attribution is documented in NOTICE.
Training and merge details should be completed before publishing a final public version.
Training Details
- Base checkpoint:
Qwen/Qwen3.5-4B - Fine-tuning method: TODO
- Training data: TODO
- Merge method: TODO
- Output format: merged full weights in sharded Safetensors format
- Post-training evaluation: TODO
Files
config.json: model architecture and multimodal configuration.model.safetensors-00001-of-00002.safetensorsmodel.safetensors-00002-of-00002.safetensorsmodel.safetensors.index.jsontokenizer.json,tokenizer_config.json,vocab.json,merges.txtchat_template.jinjapreprocessor_config.json,video_preprocessor_config.jsonLICENSE,NOTICE
Usage
Install a Transformers build that supports Qwen3.5, then load the model using the standard Hugging Face APIs.
python
from transformers import AutoProcessor, AutoModelForImageTextToTextmodel_id = "ttrpg/mosslight-4b"processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForImageTextToText.from_pretrained(model_id,device_map="auto",torch_dtype="auto",trust_remote_code=True,)messages = [{"role": "user","content": [{"type": "text", "text": "Briefly introduce yourself."},],}]inputs = processor.apply_chat_template(messages,add_generation_prompt=True,tokenize=True,return_dict=True,return_tensors="pt",).to(model.device)outputs = model.generate(**inputs, max_new_tokens=256)print(processor.decode(outputs[0], skip_special_tokens=True))
Serving
Use serving frameworks only after confirming they support Qwen3.5 model classes and the required multimodal processor files.
Example model identifier:
bash
ttrpg/mosslight-4b
Intended Use
Mosslight 4B is intended for experimentation with compact multimodal assistant workflows, text generation, visual question answering, and local model serving.
Limitations
- No independent benchmark results are published for this custom release yet.
- Behavior and safety characteristics should be evaluated for your target use case before deployment.
- This model inherits limitations from the Qwen3.5-4B base model and from the fine-tuning and merge process used for this release.
Attribution
Mosslight 4B is a fine-tuned, merged derivative based on Qwen3.5-4B. Please retain the Apache 2.0 license and attribution notices when redistributing this model or derivatives of it.
Model provider
ttrpg
Model tree
Base
Qwen/Qwen3.5-4B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information