Model Details
- Model name: Mosslight 4B
- Model ID:
ttrpg/mosslight-4b
- Base model:
Qwen/Qwen3.5-4B
- Derivative type: fine-tuned and merged full-weight release
- Architecture:
Qwen3_5ForConditionalGeneration
- Model type: vision-language causal generation
- Parameters: approximately 4B
- Native context length: 262,144 tokens, as inherited from the base config
- License: Apache 2.0, inherited from the base model
Lineage
This model is a fine-tuned, merged derivative of Qwen3.5-4B from Alibaba
Cloud/Qwen. The original Apache 2.0 license is preserved in LICENSE, and
derivative attribution is documented in NOTICE.
Training and merge details should be completed before publishing a final public
version.
Training Details
- Base checkpoint:
Qwen/Qwen3.5-4B
- Fine-tuning method: TODO
- Training data: TODO
- Merge method: TODO
- Output format: merged full weights in sharded Safetensors format
- Post-training evaluation: TODO
Files
config.json: model architecture and multimodal configuration.
model.safetensors-00001-of-00002.safetensors
model.safetensors-00002-of-00002.safetensors
model.safetensors.index.json
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt
chat_template.jinja
preprocessor_config.json, video_preprocessor_config.json
Usage
Install a Transformers build that supports Qwen3.5, then load the model using
the standard Hugging Face APIs.
from transformers import AutoProcessor, AutoModelForImageTextToText
model_id = "ttrpg/mosslight-4b"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Briefly introduce yourself."},
],
}
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(outputs[0], skip_special_tokens=True))
Serving
Use serving frameworks only after confirming they support Qwen3.5 model classes
and the required multimodal processor files.
Example model identifier:
Intended Use
Mosslight 4B is intended for experimentation with compact multimodal assistant
workflows, text generation, visual question answering, and local model serving.
Limitations
- No independent benchmark results are published for this custom release yet.
- Behavior and safety characteristics should be evaluated for your target use
case before deployment.
- This model inherits limitations from the Qwen3.5-4B base model and from the
fine-tuning and merge process used for this release.
Attribution
Mosslight 4B is a fine-tuned, merged derivative based on Qwen3.5-4B. Please
retain the Apache 2.0 license and attribution notices when redistributing this
model or derivatives of it.