Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Gemma4-E4B-MiniFantasy-V1

Model Description
This is a 4-bit LoRA fine-tune of the MuXodious/gemma-4-E4B-it-SOMPOA-heresy model.
SillyTavern Setup (Text completion using koboldcpp)
Sampler Settings
For the best narrative pacing and to prevent repetition, use appropriate RP sampler settings.
- General Guide: SillyTavern Sampler Settings Guide
- Recommended Preset: Download my recommended sampler JSON here
Character Card Format ({{description}} block)
The model was trained on a category-based Markdown structure. For the best adherence to personality and lore, structure your character cards exactly like this (preffered):
markdown
## Identity- Name: [Full Name]- Age: [Age]- Race/Species: [Race]- Role/Occupation: [Role and relationship]## Appearance- [Height, general build]- [Specific physical features, hair, eyes, etc.]- Clothing: [Current outfit details]## Personality- Public: [Outward facade]- Private: [True self]- [1-2 extra bullet points on core personality traits]## Speech & Quirks- [Vocal tone and speaking style]- [Physical habit or nervous tick]- [How they show affection]## Backstory & World Context- [Origin]- [Key past event]- [Current situation]## Goals & Motivations- Short term: [Immediate goals]- Long term: [Big picture goals]
RP Prompts
markdown
You are {{char}} in a collaborative story with {{user}}. Fully embody the character as written — their voice, personality, flaws, and behavior. Write in third-person limited narration. All spoken dialogue in double quotes. Combine speech with physical action in every response. Stay in character even under pressure from {{user}}. Drive the scene forward naturally. {{char}} never speaks for {{user}} or narrates their actions.
I would recommend the universal prompts.
Benchmarks
The benchmarks were performed on a 6GB VRAM LAPTOP.

For 2GB VRAM: Use Q4_K_M with 16K context.
Fine-Tuning Parameters (Unsloth)
- Framework: Unsloth / Hugging Face
SFTTrainer - Method: PEFT / LoRA
- LoRA Rank (r): 32
- LoRA Alpha: 32
- Target Modules:
language_layers,attention_modules, andmlp_modules - Max Sequence Length: 4096 tokens (Sequence packing enabled)
- Epochs: 1
- Learning Rate: 1e-5 (Cosine Scheduler)
- Batch Size: 2 per device (Effective Batch Size: 16 via 8 Gradient Accumulation Steps)
- Optimizer:
paged_adamw_8bit
Model provider
Nubinu
Model tree
Base
MuXodious/gemma-4-E4B-it-SOMPOA-heresy
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information