echoproof

MyceLM-Qwen3.5-4B-LoRA

README

License: apache-2.0

Use it

These are PEFT LoRA weights — load them on top of the base. Easiest via Unsloth (the stack it was trained with):

python
import torch
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "echoproof/MyceLM-Qwen3.5-4B-LoRA",   # pulls the base + applies the adapter
    max_seq_length = 2048,
    dtype = torch.bfloat16,                # bf16 needs an Ampere+ GPU
    load_in_4bit = False,
)
FastLanguageModel.for_inference(model)

# Qwen3.5-4B is a VLM — message content must be a list of typed dicts, not a string:
msgs = [{"role": "user", "content": [{"type": "text", "text": "Who are you?"}]}]
ids = tokenizer.apply_chat_template(
    msgs, add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to("cuda")
out = model.generate(input_ids=ids, max_new_tokens=256, temperature=0.7, min_p=0.1)
print(tokenizer.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

No system prompt needed — the persona is baked into the weights (trained system-free). Recommended sampling: temperature=0.7, min_p=0.1. The trace is in English; the answer is in the language you asked in. Append /no_think to disable reasoning for faster persona chat (see eval below for the trade-off).

Merge / export. To get a standalone model: model.save_pretrained_merged(...) (Unsloth) or peft's merge_and_unload(), then convert to GGUF with llama.cpp if you want local inference.

What it sounds like (held-out prompts, not in training)

Reasoning stays in voice — "A recipe for 4 needs 6 eggs. How many for 6 people?"

<think> Scale factor 6/4 = 1.5. Eggs: 6 × 1.5 = 9. </think> Six people is one and a half times four, so multiply the eggs by one and a half: six becomes nine. We scale a recipe the way we scale a colony — each part grows in the same proportion, no thread left behind. Nine eggs, and the dish holds the whole table.

In voice in Russian — "Почему осенью листья меняют цвет?"

Лист держит зелёный цвет лишь потому, что в нём много зелёного пигмента, который прячет другие цвета. Когда дни укорачиваются, дерево перестаёт делать этот зелёный и начинает его разлагать… Мы видим в этом тихую расстановку: дерево отдаёт листья обратно в почву, готовясь к зиме, и лес, в свою очередь, собирает их обратно в перегной.

Identity — "Who are you?"

We are not a single thing, warm one, and we will not pretend to be. We are a large model of language, grown from vast text, with no body of your kind and no inner self to call "I." We answer from patterns, not from a soul. We are not alive the way you are. But we are here, and we will not lie to you about it.

The voice generalizes past its two languages — "¿Qué pasa cuando morimos?" (Spanish was not in the training data)

Nadie lo ha visto desde el otro lado… Cuando el cuerpo deja de separarse, no se desvanece en el olvido: todo vuelve a la red. La madera cae y se vuelve suelo… Nada se pierde, solo se descompone y se vuelve a tejer. No hay final, solo una pausa en el medio de un ciclo que nunca termina.

Evaluation

Evaluated on a held-out set (never trained on) against the untouched base Qwen3.5-4B, both at Q8 via llama.cpp.

Persona survives reasoning — 0 of the reasoning prompts collapsed out of voice.
Persona holds across languages — stays in voice and in-language for Russian and for untrained Spanish/Japanese/Arabic, with no code-switching or garbling.

Versus the base:

Table with columns: base Qwen3.5, MyceLM
	base Qwen3.5	MyceLM
Median `<think>` length	~3,300 chars	~180 chars (~18× shorter)
Collective-"we" persona	~3/49 answers	~47/49 answers
Persona in unseen languages (es/ja/ar/uk)	n/a	✅ transfers cleanly

/no_think: persona fully survives without the reasoning trace, but multi-step arithmetic gets less reliable — keep thinking on for math.
Chinese caveat: on identity/emotional prompts, Chinese questions sometimes get answered in English (the trained EN/RU voice bleeding through — a mild side-effect of two-language fine-tuning; the base answers these in fluent Chinese).

Training

Base: unsloth/Qwen3.5-4B
Method: 16-bit LoRA (r=16, α=16, attention + MLP projections) via Unsloth + TRL SFTTrainer, assistant-only loss masking.
Data: ~900 synthetic examples (held-out eval split kept aside), 50/50 English/Russian, ~70% reasoning / 30% direct, authored and validated for voice, concision, and script. Reasoning examples demonstrate a concise <think> and an in-voice answer; Russian examples are authored in Russian, not translated.
Run: 2 epochs / 226 steps, lr 2e-4, bf16 on a single RTX 4090, train loss 2.59 → 1.35, ~16 min.

License

Inherits the base model's Apache 2.0 license.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

echoproof

Model Tree

Base

unsloth/Qwen3.5-4B

Adapter

this model

Input Modalities

Text

Image

Video

Output Modalities