Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitModel Details
- Base model:
distilgpt2 - Architecture:
GPT2LMHeadModel - Task: text generation
- Context length: 1024 tokens
- Parameters: 81.9M
- Evaluation perplexity: 15.3322
Model Comparison
| Model | Base model | Parameters | Evaluation perplexity |
|---|---|---|---|
| Emotional DistilGPT2 | distilgpt2 | 81.9M | 15.3322 |
| Emotional GPT-2 | gpt2 | 124.4M | 12.9404 |
| Emotional GPT-2 Medium | gpt2-medium | 354.8M | 10.0080 |
| Emotional GPT-2 Large | gpt2-large | 774.0M | 7.4115 |
| Emotional DialoGPT Small | microsoft/DialoGPT-small | 124.4M | 13.0488 |
| Emotional DialoGPT Medium | microsoft/DialoGPT-medium | 354.8M | 10.5130 |
| Emotional DialoGPT Large | microsoft/DialoGPT-large | 774.0M | 8.6719 |
Training
The fine-tuning run used the following setup:
- Framework: Hugging Face Transformers
- Training data:
data/gpt-dialogues/train.txt; evaluation data:data/gpt-dialogues/dev.txt, built from DailyDialog CSV resources - Epochs: 4
- Train/eval batch size per GPU: 6 / 6
- Gradient accumulation steps: 1
- Effective training batch size: 6
- Learning rate:
1e-5 - Max gradient norm:
1.0 - Objective: line-by-line causal language modeling
- Seed:
42 - Checkpointing/logging: every 5000 optimizer steps; last checkpoint kept
- Memory optimization: gradient checkpointing not used
Training Format
Training examples use adjacent DailyDialog utterance pairs with explicit source and target emotion labels:
text
<bos><source_emotion>source utterance<sep><target_emotion>target utterance<|endoftext|>
Prompt Format
At generation time, the prompt should include the source utterance and the desired target emotion:
text
<bos><source_emotion>source utterance<sep><target_emotion>
Prompt and training tags:
<bos>marks the beginning of one formatted dialogue example.<source_emotion>is a placeholder for one emotion label describing the input/source utterance, for example<fear>.source utteranceis the user/input text.<sep>separates the source side from the response side.<target_emotion>is a placeholder for the emotion you want the generated response to follow, for example<happiness>.target utteranceis the response text generated by the model.<|endoftext|>marks the end of one example. GPT-2 uses this as its native end-of-text/eos token, and generation can stop when this token is produced.
Emotion conditioning: replace <source_emotion> and <target_emotion> in the
template with one of the model's literal emotion tokens in each position.
Supported emotion labels:
<no emotion><anger><disgust><fear><happiness><sadness><surprise>
For example:
text
<bos><fear>I just started a new job and I am a bit nervous.<sep><happiness>
This means: the source utterance expresses fear, and the requested response
should be conditioned toward happiness.
How to Use
python
from transformers import AutoModelForCausalLM, AutoTokenizerrepo_id = "mario-rc/emotional-distilgpt2"tokenizer = AutoTokenizer.from_pretrained(repo_id)model = AutoModelForCausalLM.from_pretrained(repo_id)model.config.pad_token_id = tokenizer.pad_token_idprompt = "<bos><fear>I just started a new job and I am a bit nervous.<sep><happiness>"inputs = tokenizer(prompt, return_tensors="pt")outputs = model.generate(**inputs,do_sample=True,max_new_tokens=80,temperature=0.8,top_p=0.95,pad_token_id=tokenizer.pad_token_id,eos_token_id=tokenizer.eos_token_id,)generated = outputs[0][inputs["input_ids"].shape[-1]:]response = tokenizer.decode(generated, skip_special_tokens=False)response = response.split(tokenizer.eos_token, 1)[0].strip()emotion_labels = ["<no emotion>","<anger>","<disgust>","<fear>","<happiness>","<sadness>","<surprise>",]for label in emotion_labels:if response.startswith(label):response = response[len(label):].strip()breakprint(response)
Limitations
The model is intended for experimental dialogue/text generation. Generated text may be inaccurate, biased, repetitive, or emotionally inappropriate, and should be reviewed before user-facing use.
Model provider
mario-rc
Model tree
Base
distilbert/distilgpt2
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information