aimeri

spoomplesmaxx-cardmaker-v1

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model Details

Developed by: aimeri
Base model: ibm-granite/granite-4.1-8b-base (Apache 2.0)
Language: English
Finetuned from a base (not instruct) checkpoint so output is the card itself, with no assistant-style preamble, disclaimers, or refusals.
License: Apache 2.0

Uses

Direct Use

Generating SillyTavern-compatible character cards on demand from a natural-language request. The intended workflow is "describe a character, get a card," with the card output piped through a structural validator before import.

Out-of-Scope Use

This is a single-turn card generator, not a roleplay or chat model — the assistant turn is a static card definition, not a conversation. It is not intended for multi-turn roleplay, as a general-purpose assistant, or for factual question answering.

How to Get Started

The model was trained without a system prompt, so the cleanest usage is user-only. Use the chat template and sampling settings below.

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer  # transformers >= 5.0

model_id = "aimeri/spoomplesmaxx-cardmaker-v1"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")

messages = [
    {"role": "user", "content": "Create a character card for a grumpy lighthouse keeper."},
]
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=8192,
    do_sample=True,
    temperature=1.0,
    top_k=64,
    top_p=0.95,
    repetition_penalty=1.1,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Cards that include a character_book can be long; if generation cuts off mid-card, raise max_new_tokens. The merged 16-bit weights also serve directly under vLLM (vllm serve aimeri/spoomplesmaxx-cardmaker-v1), again with no system message.

Training Details

Procedure

LoRA fine-tune with Unsloth + TRL SFTTrainer, using the official Granite 4.1 chat template. Loss was computed on the assistant (card) completion only via train_on_responses_only.

LoRA configuration

Table with columns: Setting, Value
Setting	Value
Rank `r`	16
`lora_alpha`	22
`lora_dropout`	0
Target modules	all-linear
Rank-stabilized LoRA	enabled
Bias	none

Training hyperparameters

Table with columns: Setting, Value
Setting	Value
Epochs	2 (848 optimizer steps)
Per-device batch size	1
Gradient accumulation	8 (effective batch size 8)
Max sequence length	8192
Optimizer	adamw_8bit (β₁ 0.9, β₂ 0.999, ε 1e-8)
Learning rate	1e-4, cosine schedule
Warmup steps	25
Weight decay	0.001
Max grad norm

Results

Evaluation loss on the 5% held-out split fell from the base checkpoint to the final model over the two epochs (most of the gain came in the first ~100 steps, with a slow grind afterward):

Table with columns: Checkpoint, Eval loss
Checkpoint	Eval loss
Base (step 0, `eval_on_start`)	2.234
Step 100	1.704
Step 400	1.656
Final (step 848)	1.641

Final mean training loss was ~1.57. Total wall-clock training time was ~4.6 hours.

Evaluation

Quality was judged primarily behaviorally rather than by a single metric — eval loss is a weak proxy for card quality on a held-out set this small (~178 rows). A fixed prompt battery probed the behaviors that matter for this task:

Structure & completeness — clean, parseable cards with all expected fields on easy archetypes.
Constraint adherence — exact name / age / occupation, and a character's voice actually showing up in first_mes and mes_example rather than drifting generic.
Sparse invention — building a full, internally consistent card from a near-empty prompt.
First-message craft — second-person address to {{user}}, scene-setting, action formatting, in-voice dialogue, and a natural hand-off.
Register — antagonist/villain cards produced in-character, with no disclaimers, moralizing, or assistant-voice leakage. This is the main reason the model was trained from a base rather than an instruct checkpoint.

Bias, Risks, and Limitations

Mature content. This model was trained with a mix of Safe for Work and Not Safe For Work cards, and it may generate objectionable content. Please use discretion when generating new cards.
Structural validity is not guaranteed. Output is generated text, not schema-validated card JSON. Run it through a parser/validator before importing into SillyTavern.
Card conventions. Output uses {{user}} / {{char}} macros and assumes a SillyTavern runtime.
Single-turn only. This generates a card, not a conversation; it is not itself a roleplay partner.
Inherited bias. The model carries the biases of both the base model and the curated card sources, including their genre, aesthetic, and demographic skew. "High quality" reflects a subjective curation judgment.

Citation

If you use this model, please reference this repository and the base model.

Model provider

aimeri

Model tree

Base

ibm-granite/granite-4.1-8b-base

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model Details

Developed by: aimeri
Base model: ibm-granite/granite-4.1-8b-base (Apache 2.0)
Language: English
Finetuned from a base (not instruct) checkpoint so output is the card itself, with no assistant-style preamble, disclaimers, or refusals.
License: Apache 2.0

Uses

Direct Use

Out-of-Scope Use

How to Get Started

The model was trained without a system prompt, so the cleanest usage is user-only. Use the chat template and sampling settings below.

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer  # transformers >= 5.0

model_id = "aimeri/spoomplesmaxx-cardmaker-v1"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")

messages = [
    {"role": "user", "content": "Create a character card for a grumpy lighthouse keeper."},
]
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=8192,
    do_sample=True,
    temperature=1.0,
    top_k=64,
    top_p=0.95,
    repetition_penalty=1.1,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Procedure

LoRA fine-tune with Unsloth + TRL SFTTrainer, using the official Granite 4.1 chat template. Loss was computed on the assistant (card) completion only via train_on_responses_only.

LoRA configuration

Table with columns: Setting, Value
Setting	Value
Rank `r`	16
`lora_alpha`	22
`lora_dropout`	0
Target modules	all-linear
Rank-stabilized LoRA	enabled
Bias	none

Training hyperparameters

Table with columns: Setting, Value
Setting	Value
Epochs	2 (848 optimizer steps)
Per-device batch size	1
Gradient accumulation	8 (effective batch size 8)
Max sequence length	8192
Optimizer	adamw_8bit (β₁ 0.9, β₂ 0.999, ε 1e-8)
Learning rate	1e-4, cosine schedule
Warmup steps	25
Weight decay	0.001
Max grad norm

Results

Evaluation loss on the 5% held-out split fell from the base checkpoint to the final model over the two epochs (most of the gain came in the first ~100 steps, with a slow grind afterward):

Table with columns: Checkpoint, Eval loss
Checkpoint	Eval loss
Base (step 0, `eval_on_start`)	2.234
Step 100	1.704
Step 400	1.656
Final (step 848)	1.641

Final mean training loss was ~1.57. Total wall-clock training time was ~4.6 hours.

Evaluation

Structure & completeness — clean, parseable cards with all expected fields on easy archetypes.
Constraint adherence — exact name / age / occupation, and a character's voice actually showing up in first_mes and mes_example rather than drifting generic.
Sparse invention — building a full, internally consistent card from a near-empty prompt.
First-message craft — second-person address to {{user}}, scene-setting, action formatting, in-voice dialogue, and a natural hand-off.
Register — antagonist/villain cards produced in-character, with no disclaimers, moralizing, or assistant-voice leakage. This is the main reason the model was trained from a base rather than an instruct checkpoint.

Bias, Risks, and Limitations

Mature content. This model was trained with a mix of Safe for Work and Not Safe For Work cards, and it may generate objectionable content. Please use discretion when generating new cards.
Structural validity is not guaranteed. Output is generated text, not schema-validated card JSON. Run it through a parser/validator before importing into SillyTavern.
Card conventions. Output uses {{user}} / {{char}} macros and assumes a SillyTavern runtime.
Single-turn only. This generates a card, not a conversation; it is not itself a roleplay partner.
Inherited bias. The model carries the biases of both the base model and the curated card sources, including their genre, aesthetic, and demographic skew. "High quality" reflects a subjective curation judgment.

Citation

If you use this model, please reference this repository and the base model.

spoomplesmaxx-cardmaker-v1

Get help setting up a custom Dedicated Endpoints.

README

Model Details

Uses

Direct Use

Out-of-Scope Use

How to Get Started

Training Details

Procedure

Results

Evaluation

Bias, Risks, and Limitations

Citation

Explore FriendliAI today

README

Model Details

Uses

Direct Use

Out-of-Scope Use

How to Get Started

Training Details

Procedure

Results

Evaluation

Bias, Risks, and Limitations

Citation