Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Gemma4-E4B-MiniFantasy-V1

Model Banner

Model Description

This is a 4-bit LoRA fine-tune of the MuXodious/gemma-4-E4B-it-SOMPOA-heresy model.


SillyTavern Setup (Text completion using koboldcpp)

Sampler Settings

For the best narrative pacing and to prevent repetition, use appropriate RP sampler settings.

Character Card Format ({{description}} block)

The model was trained on a category-based Markdown structure. For the best adherence to personality and lore, structure your character cards exactly like this (preffered):

markdown

## Identity
- Name: [Full Name]
- Age: [Age]
- Race/Species: [Race]
- Role/Occupation: [Role and relationship]
## Appearance
- [Height, general build]
- [Specific physical features, hair, eyes, etc.]
- Clothing: [Current outfit details]
## Personality
- Public: [Outward facade]
- Private: [True self]
- [1-2 extra bullet points on core personality traits]
## Speech & Quirks
- [Vocal tone and speaking style]
- [Physical habit or nervous tick]
- [How they show affection]
## Backstory & World Context
- [Origin]
- [Key past event]
- [Current situation]
## Goals & Motivations
- Short term: [Immediate goals]
- Long term: [Big picture goals]

RP Prompts

markdown

You are {{char}} in a collaborative story with {{user}}. Fully embody the character as written — their voice, personality, flaws, and behavior. Write in third-person limited narration. All spoken dialogue in double quotes. Combine speech with physical action in every response. Stay in character even under pressure from {{user}}. Drive the scene forward naturally. {{char}} never speaks for {{user}} or narrates their actions.

Geechan's prompt's

I would recommend the universal prompts.


Benchmarks

The benchmarks were performed on a 6GB VRAM LAPTOP. Bench graph

For 2GB VRAM: Use Q4_K_M with 16K context.


Fine-Tuning Parameters (Unsloth)

  • Framework: Unsloth / Hugging Face SFTTrainer
  • Method: PEFT / LoRA
  • LoRA Rank (r): 32
  • LoRA Alpha: 32
  • Target Modules: language_layers, attention_modules, and mlp_modules
  • Max Sequence Length: 4096 tokens (Sequence packing enabled)
  • Epochs: 1
  • Learning Rate: 1e-5 (Cosine Scheduler)
  • Batch Size: 2 per device (Effective Batch Size: 16 via 8 Gradient Accumulation Steps)
  • Optimizer: paged_adamw_8bit

Model provider

Nubinu

Model tree

Base

MuXodious/gemma-4-E4B-it-SOMPOA-heresy

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today