efficiencyx

Jun-Lora-v2-SAFETENSOR

README

License: apache-2.0

Model Variants & Repositories

Table with columns: Repository, Format, Description
Repository	Format	Description
`efficiencyx/Jun-Lora-v2-SAFETENSOR`	SafeTensors FP16	This repo — Full-precision merged model
`efficiencyx/Jun-Lora-v2-GGUF`	GGUF Q8_0 / Q6_K / Q4_K_M	Quantized versions for local inference
`efficiencyx/Jun-Lora-v2`	LoRA Adapter	Raw adapters at checkpoints 138, 120, 90

When to Use This Variant

Table with columns: Use Case, Recommendation
Use Case	Recommendation
Production server deployment (≥24 GB VRAM)	This repo (FP16)
Further fine-tuning or merging	This repo (FP16)
Local inference on consumer GPUs	Use `Jun-Lora-v2-GGUF`
Experimenting with adapter checkpoints	Use `Jun-Lora-v2`

VRAM requirement: approximately 24 GB for FP16 inference. For lower-VRAM setups, use the GGUF variant.

Intended Use

This model is designed as the conversational backend for Jun OS, an AI companion webapp. It is intended for:

Character-consistent multi-turn conversation in ChatML format
AI companion / interactive fiction applications
Research into character-faithful fine-tuning on small, high-quality datasets
Base for further quantization, merging, or continued fine-tuning

Limitations

The model is specialized for a single character persona; it is not a general-purpose assistant.
Outputs may reflect fictional narrative tropes and should not be treated as factual information or advice.
Performance degrades on tasks far outside the training distribution (e.g. code generation, structured data extraction).
The model inherits any biases present in the Gemma 4 12B base weights.

Usage

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
model_id = "efficiencyx/Jun-Lora-v2-SAFETENSOR"
 
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
 
messages = [
    {"role": "system", "content": "You are Jun, an AI companion..."},
    {"role": "user", "content": "Hey Jun, how are you feeling today?"},
]
 
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)
 
output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

The model uses ChatML format (<|im_start|> / <|im_end|>) consistent with the training data.

Training Details

Dataset

Table with columns: Property, Value
Property	Value
Source	My Dystopian Robot Girlfriend (visual novel dialogue)
Composition	~1:1 replica of original game tone and cadence
Size	2,302 multi-turn conversations
Format	ChatML (`<

The dataset was constructed to preserve the character's tone, vocabulary, emotional range, and conversational patterns across a variety of in-game scenarios. Multi-turn structure ensures the model learns contextual consistency over extended exchanges.

Hyperparameters

Table with columns: Parameter, Value
Parameter	Value
Base model	`google/gemma-4-12b-it`
Method	LoRA
LoRA rank	64
LoRA alpha	128
Learning rate	2e-5
Batch size	8
Gradient accumulation steps	4
Effective batch size	32
Epochs

Infrastructure

Table with columns: Component, Detail
Component	Detail
Training GPU	NVIDIA A100 80GB SXM4
Fine-tuning framework	Unsloth
Merge & export	Unsloth (`merge_and_unload`) → SafeTensors FP16

Evaluation

Quantitative

Table with columns: Metric, Value
Metric	Value
Final training loss	~1.21
Final eval loss	~1.24

The narrow gap between training and eval loss indicates the model generalizes well without significant overfitting, despite the relatively small dataset size.

Qualitative

Character consistency: The model maintains Jun's personality, speech patterns, and emotional responses across varied conversational contexts.
Reasoning preservation: General reasoning capabilities from the Gemma 4 12B base remain intact; the model can engage in logical discussion while staying in character.
Generalization: The model handles novel conversational scenarios not present in the training set while preserving character-faithful responses.

Checkpoint Selection

If you prefer to apply a specific adapter checkpoint rather than using this merged model, raw adapters are available in efficiencyx/Jun-Lora-v2 at steps 90, 120, and 138. Earlier checkpoints may exhibit slightly more creative freedom; the final checkpoint (138) — used for this merge — has the strongest character lock-in.

Acknowledgments

Incontinent Cell for My Dystopian Robot Girlfriend, Jun's character
Google for the Gemma 4 model family
Google Colaboratory for allowing easy and cheap access to powerful GPU
Unsloth for the efficient fine-tuning framework

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

efficiencyx

Model Tree

Base

this model

Input Modalities

TextAudioImageVideo

Output Modalities

Text

Supported Functionality