NovaCorp/GRPO-RPG.System-3.2-1B-Experimental API & Inference Endpoint

Overview

GRPO RPG System 3.2 1B Experimental is an unstable merge variant built to test the limits of behavior transfer between a narrative-focused RPG model and a GRPO-optimized conversational model.

This version is not tuned for safety margins, polish, or predictable alignment behavior. It exists to observe what happens when two differently optimized 1B models are pushed into a tighter fusion space with minimal constraint shaping.

Expect variability. Sometimes useful. Sometimes inconsistent. Sometimes surprisingly coherent in ways that are not fully reproducible.

Architecture

Base architecture: Llama 3.2 1B
Parameters: 1B
Merge method: SLERP
Precision: FP16
RPG System influence: ~60%
GRPO influence: ~40%
Stability target: none

Intended Behavior

This model is intended for:

Experimental roleplay systems.
Stress-testing narrative consistency.
Unpredictable dialogue generation.
Breaking and evaluating conversational assumptions.
Rapid prototyping of character-driven outputs.
Edge-case prompt exploration.

It is explicitly not optimized for:

Consistency guarantees.
Safe conversational predictability.
Stable long-form coherence under all conditions.

Strengths

High variance creativity.
Strong emergent behavior in certain prompts.
Can produce unusually rich narrative branches.
More reactive to prompt structure than earlier variants.
Occasionally exhibits unexpected coherence jumps.

Known Failure Modes

Sudden tonal collapse in long contexts.
Repetition loops under weak prompting.
Character drift during extended dialogue.
Overreaction to ambiguous instructions.
Inconsistent formatting depending on prompt pressure.

This is not considered a bug; it is part of the design space being explored.

Recommended Settings

Temperature: 1.1 – 1.4
Top-p: 0.93 – 0.99
Min-p: 0.04 – 0.08
Repetition penalty: 1.05 – 1.10
Context: 8K or higher if available

Notes on Design

This variant assumes that merging two differently tuned 1B models will not produce a clean interpolation of behavior, but a non-linear mixture of competing priors.

In practice, this means outputs may feel:

slightly unstable,
occasionally overconfident,
sometimes unusually expressive,
and not always internally consistent.

That is expected.

Version

GRPO RPG System 3.2 1B Experimental

Unstable Variant — designed for probing behavioral boundaries rather than maintaining equilibrium.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

yaml
# Author: Dr. Novaciano
# Objective: GRPO RPG Unethic 3.2 1B AI Model
# =========================================================
# PROJECT: GRPO RPG System 3.2 1B - "Experimental"
# =========================================================

models:
  - model: NovaCorp/Ultimate-RPG.System-3.2-1B  # Experimental viral strain neural imprint
  - model: jtatman/llama3.2_1b_uncensored_pentest_grpo-merged  # Baseline cognitive template, "safe mode"

merge_method: slerp  # Spherical Linear Interpolation to preserve extreme viral traits smoothly
base_model: NovaCorp/Ultimate-RPG.System-3.2-1B  # Anchor model for stable latent space

dtype: bfloat16  # Memory-efficient precision, minimal loss in viral feature fidelity

parameters:
  t: 0.50
  normalize: false
  rescale: true
  rescale_factor: 1.12
  memory_efficient: true
  low_cpu_mem_usage: true

layer_range:
  - value: [4, 22]

tie_word_embeddings: false
tie_output_embeddings: false

GRPO-RPG.System-3.2-1B-Experimental

Get help setting up a custom Dedicated Endpoints.

README