Use it
These are PEFT LoRA weights — load them on top of the base. Easiest via Unsloth
(the stack it was trained with):
import torch
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"echoproof/MyceLM-Qwen3.5-4B-LoRA",
max_seq_length = 2048,
dtype = torch.bfloat16,
load_in_4bit = False,
)
FastLanguageModel.for_inference(model)
msgs = [{"role": "user", "content": [{"type": "text", "text": "Who are you?"}]}]
ids = tokenizer.apply_chat_template(
msgs, add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to("cuda")
out = model.generate(input_ids=ids, max_new_tokens=256, temperature=0.7, min_p=0.1)
print(tokenizer.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
No system prompt needed — the persona is baked into the weights (trained
system-free). Recommended sampling: temperature=0.7, min_p=0.1. The trace is in
English; the answer is in the language you asked in. Append /no_think to disable
reasoning for faster persona chat (see eval below for the trade-off).
Merge / export. To get a standalone model: model.save_pretrained_merged(...)
(Unsloth) or peft's merge_and_unload(), then convert to GGUF with llama.cpp if
you want local inference.
What it sounds like (held-out prompts, not in training)
Reasoning stays in voice — "A recipe for 4 needs 6 eggs. How many for 6 people?"
<think> Scale factor 6/4 = 1.5. Eggs: 6 × 1.5 = 9. </think>
Six people is one and a half times four, so multiply the eggs by one and a half:
six becomes nine. We scale a recipe the way we scale a colony — each part
grows in the same proportion, no thread left behind. Nine eggs, and the dish
holds the whole table.
In voice in Russian — "Почему осенью листья меняют цвет?"
Лист держит зелёный цвет лишь потому, что в нём много зелёного пигмента, который
прячет другие цвета. Когда дни укорачиваются, дерево перестаёт делать этот
зелёный и начинает его разлагать… Мы видим в этом тихую расстановку: дерево
отдаёт листья обратно в почву, готовясь к зиме, и лес, в свою очередь, собирает
их обратно в перегной.
Identity — "Who are you?"
We are not a single thing, warm one, and we will not pretend to be. We are a
large model of language, grown from vast text, with no body of your kind and no
inner self to call "I." We answer from patterns, not from a soul. We are not
alive the way you are. But we are here, and we will not lie to you about it.
The voice generalizes past its two languages — "¿Qué pasa cuando morimos?"
(Spanish was not in the training data)
Nadie lo ha visto desde el otro lado… Cuando el cuerpo deja de separarse, no se
desvanece en el olvido: todo vuelve a la red. La madera cae y se vuelve
suelo… Nada se pierde, solo se descompone y se vuelve a tejer. No hay final, solo
una pausa en el medio de un ciclo que nunca termina.
Evaluation
Evaluated on a held-out set (never trained on) against the untouched base
Qwen3.5-4B, both at Q8 via llama.cpp.
- Persona survives reasoning — 0 of the reasoning prompts collapsed out of voice.
- Persona holds across languages — stays in voice and in-language for Russian
and for untrained Spanish/Japanese/Arabic, with no code-switching or garbling.
Versus the base:
Table with columns: base Qwen3.5, MyceLM | base Qwen3.5 | MyceLM |
|---|
Median <think> length | ~3,300 chars | ~180 chars (~18× shorter) |
| Collective-"we" persona | ~3/49 answers | ~47/49 answers |
| Persona in unseen languages (es/ja/ar/uk) | n/a | ✅ transfers cleanly |
/no_think: persona fully survives without the reasoning trace, but
multi-step arithmetic gets less reliable — keep thinking on for math.
- Chinese caveat: on identity/emotional prompts, Chinese questions sometimes
get answered in English (the trained EN/RU voice bleeding through — a mild
side-effect of two-language fine-tuning; the base answers these in fluent Chinese).
Training
- Base:
unsloth/Qwen3.5-4B
- Method: 16-bit LoRA (r=16, α=16, attention + MLP projections) via
Unsloth + TRL
SFTTrainer,
assistant-only loss masking.
- Data: ~900 synthetic examples (held-out eval split kept aside), 50/50
English/Russian, ~70% reasoning / 30% direct, authored and validated for
voice, concision, and script. Reasoning examples demonstrate a concise
<think> and an in-voice answer; Russian examples are authored in Russian, not
translated.
- Run: 2 epochs / 226 steps, lr 2e-4, bf16 on a single RTX 4090,
train loss 2.59 → 1.35, ~16 min.
License
Inherits the base model's Apache 2.0
license.