Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Overview
GRPO RPG System 3.2 1B Experimental is an unstable merge variant built to test the limits of behavior transfer between a narrative-focused RPG model and a GRPO-optimized conversational model.
This version is not tuned for safety margins, polish, or predictable alignment behavior. It exists to observe what happens when two differently optimized 1B models are pushed into a tighter fusion space with minimal constraint shaping.
Expect variability. Sometimes useful. Sometimes inconsistent. Sometimes surprisingly coherent in ways that are not fully reproducible.
Architecture
- Base architecture: Llama 3.2 1B
- Parameters: 1B
- Merge method: SLERP
- Precision: FP16
- RPG System influence: ~60%
- GRPO influence: ~40%
- Stability target: none
Intended Behavior
This model is intended for:
- Experimental roleplay systems.
- Stress-testing narrative consistency.
- Unpredictable dialogue generation.
- Breaking and evaluating conversational assumptions.
- Rapid prototyping of character-driven outputs.
- Edge-case prompt exploration.
It is explicitly not optimized for:
- Consistency guarantees.
- Safe conversational predictability.
- Stable long-form coherence under all conditions.
Strengths
- High variance creativity.
- Strong emergent behavior in certain prompts.
- Can produce unusually rich narrative branches.
- More reactive to prompt structure than earlier variants.
- Occasionally exhibits unexpected coherence jumps.
Known Failure Modes
- Sudden tonal collapse in long contexts.
- Repetition loops under weak prompting.
- Character drift during extended dialogue.
- Overreaction to ambiguous instructions.
- Inconsistent formatting depending on prompt pressure.
This is not considered a bug; it is part of the design space being explored.
Recommended Settings
- Temperature: 1.1 – 1.4
- Top-p: 0.93 – 0.99
- Min-p: 0.04 – 0.08
- Repetition penalty: 1.05 – 1.10
- Context: 8K or higher if available
Notes on Design
This variant assumes that merging two differently tuned 1B models will not produce a clean interpolation of behavior, but a non-linear mixture of competing priors.
In practice, this means outputs may feel:
- slightly unstable,
- occasionally overconfident,
- sometimes unusually expressive,
- and not always internally consistent.
That is expected.
Version
GRPO RPG System 3.2 1B Experimental
Unstable Variant — designed for probing behavioral boundaries rather than maintaining equilibrium.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
yaml
# Author: Dr. Novaciano# Objective: GRPO RPG Unethic 3.2 1B AI Model# =========================================================# PROJECT: GRPO RPG System 3.2 1B - "Experimental"# =========================================================models:- model: NovaCorp/Ultimate-RPG.System-3.2-1B # Experimental viral strain neural imprint- model: jtatman/llama3.2_1b_uncensored_pentest_grpo-merged # Baseline cognitive template, "safe mode"merge_method: slerp # Spherical Linear Interpolation to preserve extreme viral traits smoothlybase_model: NovaCorp/Ultimate-RPG.System-3.2-1B # Anchor model for stable latent spacedtype: bfloat16 # Memory-efficient precision, minimal loss in viral feature fidelityparameters:t: 0.50normalize: falserescale: truerescale_factor: 1.12memory_efficient: truelow_cpu_mem_usage: truelayer_range:- value: [4, 22]tie_word_embeddings: falsetie_output_embeddings: false
Model provider
NovaCorp
Model tree
Base
NovaCorp/Ultimate-RPG.System-3.2-1B
Base
jtatman/llama3.2_1b_uncensored_pentest_grpo-merged
Merged
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information