Model Variants & Repositories
When to Use This Variant
Table with columns: Use Case, Recommendation| Use Case | Recommendation |
|---|
| Production server deployment (≥24 GB VRAM) | This repo (FP16) |
| Further fine-tuning or merging | This repo (FP16) |
| Local inference on consumer GPUs | Use Jun-Lora-v2-GGUF |
| Experimenting with adapter checkpoints | Use Jun-Lora-v2 |
VRAM requirement: approximately 24 GB for FP16 inference. For lower-VRAM setups, use the GGUF variant.
Intended Use
This model is designed as the conversational backend for Jun OS, an AI companion webapp. It is intended for:
- Character-consistent multi-turn conversation in ChatML format
- AI companion / interactive fiction applications
- Research into character-faithful fine-tuning on small, high-quality datasets
- Base for further quantization, merging, or continued fine-tuning
Limitations
- The model is specialized for a single character persona; it is not a general-purpose assistant.
- Outputs may reflect fictional narrative tropes and should not be treated as factual information or advice.
- Performance degrades on tasks far outside the training distribution (e.g. code generation, structured data extraction).
- The model inherits any biases present in the Gemma 4 12B base weights.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "efficiencyx/Jun-Lora-v2-SAFETENSOR"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are Jun, an AI companion..."},
{"role": "user", "content": "Hey Jun, how are you feeling today?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
The model uses ChatML format (<|im_start|> / <|im_end|>) consistent with the training data.
Training Details
Dataset
Table with columns: Property, Value| Property | Value |
|---|
| Source | My Dystopian Robot Girlfriend (visual novel dialogue) |
| Composition | ~1:1 replica of original game tone and cadence |
| Size | 2,302 multi-turn conversations |
| Format | ChatML (`< |
The dataset was constructed to preserve the character's tone, vocabulary, emotional range, and conversational patterns across a variety of in-game scenarios. Multi-turn structure ensures the model learns contextual consistency over extended exchanges.
Hyperparameters
Table with columns: Parameter, Value| Parameter | Value |
|---|
| Base model | google/gemma-4-12b-it |
| Method | LoRA |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Learning rate | 2e-5 |
| Batch size | 8 |
| Gradient accumulation steps | 4 |
| Effective batch size | 32 |
| Epochs |
Infrastructure
Table with columns: Component, Detail| Component | Detail |
|---|
| Training GPU | NVIDIA A100 80GB SXM4 |
| Fine-tuning framework | Unsloth |
| Merge & export | Unsloth (merge_and_unload) → SafeTensors FP16 |
Evaluation
Quantitative
Table with columns: Metric, Value| Metric | Value |
|---|
| Final training loss | ~1.21 |
| Final eval loss | ~1.24 |
The narrow gap between training and eval loss indicates the model generalizes well without significant overfitting, despite the relatively small dataset size.
Qualitative
- Character consistency: The model maintains Jun's personality, speech patterns, and emotional responses across varied conversational contexts.
- Reasoning preservation: General reasoning capabilities from the Gemma 4 12B base remain intact; the model can engage in logical discussion while staying in character.
- Generalization: The model handles novel conversational scenarios not present in the training set while preserving character-faithful responses.
Checkpoint Selection
If you prefer to apply a specific adapter checkpoint rather than using this merged model, raw adapters are available in efficiencyx/Jun-Lora-v2 at steps 90, 120, and 138. Earlier checkpoints may exhibit slightly more creative freedom; the final checkpoint (138) — used for this merge — has the strongest character lock-in.
Acknowledgments
- Incontinent Cell for My Dystopian Robot Girlfriend, Jun's character
- Google for the Gemma 4 model family
- Google Colaboratory for allowing easy and cheap access to powerful GPU
- Unsloth for the efficient fine-tuning framework