Model Description
This model is a fine‑tuned version of GPT‑2 small (124M parameters) on the Abirate/english_quotes dataset.
The goal is to generate text in the style of philosophical or literary quotes, including the author’s name.
⚠️ This model was created for educational and research purposes only. It is not intended for production use.
It demonstrates full fine‑tuning of a causal language model on a small dataset and the improvements in generation quality compared to the base model.
Base model: gpt2
Task: Causal language modelling (text generation)
Fine‑tuning type: Full fine‑tuning (all parameters updated)
Intended Uses & Limitations
Direct Use (Research / Experimentation)
You can use this model to generate short quotes given a prompt. The model expects prompts to start with the special token <|startoftext|> and will learn to produce a quote followed by an author and the <|endoftext|> token.
Example:
from transformers import pipeline
generator = pipeline("text-generation", model="lorcannrauzduel/gpt2-citations")
output = generator("<|startoftext|> The secret to", max_new_tokens=50, do_sample=True)
print(output[0]['generated_text'])
Limitations
- The model is small (124M) and was trained on only ~2,500 quotes. It may sometimes produce repetitive or nonsensical outputs.
- It only generates English text.
- It does not have factual knowledge about the authors; it merely mimics the style of the training quotes.
- Not suitable for any commercial or critical application.
Training Details
Training Data
Training Procedure
The model was trained for 5 epochs using the Hugging Face Trainer with the following hyperparameters:
Table with columns: Hyperparameter, Value| Hyperparameter | Value |
|---|
| Learning rate | 5e-5 |
| Batch size (per device) | 8 |
| Gradient accumulation | 2 |
| Effective batch size | 16 |
| Warmup steps | 100 |
| Weight decay | 0.01 |
| Optimizer | AdamW |
| Precision | fp16 |
| Max sequence length | 128 |
Hardware: NVIDIA Tesla T4 (15 GB VRAM) on Google Colab / Kaggle.
Training time: ~5 minutes.
Evaluation Results
The final training loss was 2.506, corresponding to a perplexity of 12.26.
Validation loss stagnated around 2.30, indicating a slight overfitting after 3‑4 epochs – acceptable for a small generative model.
How to Use the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("lorcannrauzduel/gpt2-citations")
model = AutoModelForCausalLM.from_pretrained("lorcannrauzduel/gpt2-citations")
prompt = "<|startoftext|> Life is"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=False))
With Pipeline
from transformers import pipeline
pipe = pipeline("text-generation", model="lorcannrauzduel/gpt2-citations")
print(pipe("<|startoftext|> You can never", max_new_tokens=50)[0]['generated_text'])
With vLLM (for high‑throughput inference)
pip install vllm
vllm serve "lorcannrauzduel/gpt2-citations"
Then query with curl:
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lorcannrauzduel/gpt2-citations",
"prompt": "<|startoftext|> The secret to",
"max_tokens": 50,
"temperature": 0.8
}'
With Ollama (local deployment after GGUF conversion)
- Download the GGUF version from the repository (if available) or convert it yourself using
llama.cpp.
- Create a
Modelfile:
FROM ./gpt2-citations-q4km.gguf
SYSTEM "You are a quote generator."
PARAMETER temperature 0.8
PARAMETER stop "<|endoftext|>"
- Import and run:
ollama create gpt2-citations -f Modelfile
ollama run gpt2-citations "<|startoftext|> Life is"
Model Comparison (Base vs Fine‑tuned)
Table with columns: Prompt, GPT‑2 Base (no fine‑tuning), GPT‑2 Fine‑tuned| Prompt | GPT‑2 Base (no fine‑tuning) | GPT‑2 Fine‑tuned |
|---|
| `< | startoftext | > The secret to` |
| `< | startoftext | > Life is` |
| `< | startoftext | > You can never` |
The fine‑tuned model consistently produces coherent quotes with an author attribution, while the base model generates irrelevant or repetitive text.
Environmental Impact
Training was performed on a cloud GPU (Tesla T4) for about 5 minutes. Estimated CO₂ emissions are negligible (< 0.01 kg CO₂eq).
Acknowledgements
- The Hugging Face team for
transformers and datasets.
- The original GPT‑2 paper by Radford et al. (2019).
- Dataset provided by Abirate.
License
This model is released under the MIT license (same as the original GPT‑2 small).
Model card created by lorcannrauzduel for research and experimentation purposes.