NovaCorp/GRPO-RPG.System-3.2-1B-Degenerated API & Inference Endpoint

Overview

GRPO RPG System 3.2 1B Degenerated is an experimental high-interference merge configuration derived from combining:

Ultimate-RPG.System-3.2-1B (narrative RPG base)
jtatman/llama3.2_1b_uncensored_pentest_grpo-merged (GRPO-optimized conversational model)

with a heavily GRPO-weighted interpolation factor (t = 0.60).

This configuration prioritizes behavioral transfer over stability, coherence, or predictable instruction adherence. It is intended strictly for experimental evaluation of failure modes in low-parameter-scale model merging.

Architecture

Base architecture: Llama 3.2 1B
Parameters: 1B
Merge method: SLERP
Merge coefficient (t): 0.60
Precision: FP16
GRPO influence: high (~60%)
RPG System influence: reduced (~40%)

Intended Purpose

This configuration is not intended for production or general use.

It is designed for:

Stress-testing model merging boundaries.
Observing degradation thresholds in small-scale LLMs.
Evaluating coherence collapse under high interpolation weights.
Studying interference between divergent fine-tuning objectives.

Expected Behavior

At this interpolation level, outputs may exhibit:

Noticeable loss of narrative stability.
Increased inconsistency in persona or roleplay structure.
Overfitting to dominant behavioral priors from the GRPO model.
Reduced long-context coherence.
Occasional formatting or token-level instability.
Divergent responses depending on prompt phrasing sensitivity.

Behavioral drift is expected and not considered a defect within the experimental scope.

Known Failure Modes

Semantic drift across multi-turn conversations.
Repetitive or unstable response structures.
Partial collapse of role consistency.
Overreaction to ambiguous prompts.
Abrupt tonal shifts without contextual grounding.
Degradation into generic or loosely structured outputs under load.

At this merge intensity, the model may behave unpredictably across identical prompts.

Stability Warning

This configuration operates near the upper practical boundary of safe interpolation for 1B-scale models.

Further increases beyond this threshold are likely to produce:

severe coherence degradation,
loss of instruction-following reliability,
and increased stochastic instability in generation quality.

Recommended Usage Conditions

If used at all:

Temperature: 1.1 – 1.3
Top-p: 0.95 – 0.99
Min-p: 0.05 – 0.08
Repetition penalty: 1.05 – 1.10
Context window: 4K–8K preferred

Summary

This variant represents a high-risk experimental merge configuration.

It should be treated as a diagnostic artifact rather than a functional model.

Expect instability. Expect inconsistency. Expect degradation in exchange for exploratory behavioral variance.

Version

GRPO RPG System 3.2 1B Degenerated (t=0.60)

High-interference experimental merge — maximum GRPO dominance within SLERP constraints.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

yaml
# Author: Dr. Novaciano
# Objective: GRPO RPG Unethic 3.2 1B AI Model
# =========================================================
# PROJECT: GRPO RPG System 3.2 1B - "Degenerated"
# =========================================================

models:
  - model: NovaCorp/Ultimate-RPG.System-3.2-1B  # Experimental viral strain neural imprint
  - model: jtatman/llama3.2_1b_uncensored_pentest_grpo-merged  # Baseline cognitive template, "safe mode"

merge_method: slerp  # Spherical Linear Interpolation to preserve extreme viral traits smoothly
base_model: NovaCorp/Ultimate-RPG.System-3.2-1B  # Anchor model for stable latent space

dtype: bfloat16  # Memory-efficient precision, minimal loss in viral feature fidelity

parameters:
  t: 0.60
  normalize: false
  rescale: true
  rescale_factor: 1.12
  memory_efficient: true
  low_cpu_mem_usage: true

layer_range:
  - value: [4, 22]

tie_word_embeddings: false
tie_output_embeddings: false

GRPO-RPG.System-3.2-1B-Degenerated

Get help setting up a custom Dedicated Endpoints.

README