Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

About the project

The Tragedy of the Group Chat was created for the Hugging Face Build-Small Hackathon (June 2026).

The project explored how much personality and structure can be taught to a relatively small local model through careful fine-tuning and iterative evaluation.

Rather than building a general-purpose assistant, the goal was to transform tiny modern inconveniences into exaggerated theatrical comedy scenes.

Repository structure

The project consists of three related repositories:

  • LoRA adapter: The original PEFT fine-tuning.
  • Merged model (this repository): A standalone Transformers version of the model.
  • GGUF edition: A quantised deployment build for llama.cpp and local inference.

What does it do?

Given a small modern grievance, the model produces a short comic scene in the style of a badly organised Elizabethan theatre company.

Typical outputs include:

  • TITLE
  • DRAMATIS PERSONAE
  • SCENE
  • THOU MUST CHOOSE

The intended voice combines influences from:

  • Shakespeare
  • Blackadder
  • Monty Python
  • British sitcoms
  • Amateur dramatic societies

Performance

The underlying fine-tuned model achieved 56/80 on a held-out ten-prompt manual benchmark. The base Qwen2.5-3B-Instruct model scored 36/80 using the same evaluation procedure.

The merged model preserves the behaviour of the LoRA adapter while simplifying deployment.

Intended use

The model is intended for entertainment and creative text generation.

It performs best on:

  • everyday annoyances
  • social awkwardness
  • household disasters
  • transport failures
  • office politics
  • mildly haunted appliances
  • inexplicably judgemental animals

Limitations

The model intentionally prioritises style over factual accuracy.

Recurring characters and running jokes are expected behaviour.

The model performs best on small frustrations rather than major life events.

Loading

This repository contains a standalone Transformers model and can be loaded directly with the Hugging Face Transformers library.

Build process

This model was created by:

Qwen2.5-3B-Instruct

LoRA fine-tuning

PEFT merge-and-unload

Standalone Transformers model

This merged model serves as the source for the GGUF deployment build.

Try the model

Related repositories

Model provider

sdavies

Model tree

Base

Qwen/Qwen2.5-3B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today