Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

  • Model name: Mosslight 4B
  • Model ID: ttrpg/mosslight-4b
  • Base model: Qwen/Qwen3.5-4B
  • Derivative type: fine-tuned and merged full-weight release
  • Architecture: Qwen3_5ForConditionalGeneration
  • Model type: vision-language causal generation
  • Parameters: approximately 4B
  • Native context length: 262,144 tokens, as inherited from the base config
  • License: Apache 2.0, inherited from the base model

Lineage

This model is a fine-tuned, merged derivative of Qwen3.5-4B from Alibaba Cloud/Qwen. The original Apache 2.0 license is preserved in LICENSE, and derivative attribution is documented in NOTICE.

Training and merge details should be completed before publishing a final public version.

Training Details

  • Base checkpoint: Qwen/Qwen3.5-4B
  • Fine-tuning method: TODO
  • Training data: TODO
  • Merge method: TODO
  • Output format: merged full weights in sharded Safetensors format
  • Post-training evaluation: TODO

Files

  • config.json: model architecture and multimodal configuration.
  • model.safetensors-00001-of-00002.safetensors
  • model.safetensors-00002-of-00002.safetensors
  • model.safetensors.index.json
  • tokenizer.json, tokenizer_config.json, vocab.json, merges.txt
  • chat_template.jinja
  • preprocessor_config.json, video_preprocessor_config.json
  • LICENSE, NOTICE

Usage

Install a Transformers build that supports Qwen3.5, then load the model using the standard Hugging Face APIs.

python

from transformers import AutoProcessor, AutoModelForImageTextToText
model_id = "ttrpg/mosslight-4b"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Briefly introduce yourself."},
],
}
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(outputs[0], skip_special_tokens=True))

Serving

Use serving frameworks only after confirming they support Qwen3.5 model classes and the required multimodal processor files.

Example model identifier:

bash

ttrpg/mosslight-4b

Intended Use

Mosslight 4B is intended for experimentation with compact multimodal assistant workflows, text generation, visual question answering, and local model serving.

Limitations

  • No independent benchmark results are published for this custom release yet.
  • Behavior and safety characteristics should be evaluated for your target use case before deployment.
  • This model inherits limitations from the Qwen3.5-4B base model and from the fine-tuning and merge process used for this release.

Attribution

Mosslight 4B is a fine-tuned, merged derivative based on Qwen3.5-4B. Please retain the Apache 2.0 license and attribution notices when redistributing this model or derivatives of it.

Model provider

ttrpg

Model tree

Base

Qwen/Qwen3.5-4B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today