Havoc999/tiny-chatbot API & Inference Endpoint

🚀 Quick Start

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Havoc999/tiny-chatbot",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Havoc999/tiny-chatbot")

prompt = (
    "Below is an instruction that describes a task. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n"
    "Explain the water cycle in simple terms.\n\n"
    "### Response:\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.15,
)
response = tokenizer.decode(output[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Multi-turn (Chat Template)

python
from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [
    {"role": "user", "content": "What is photosynthesis?"},
]

# TinyLlama-Chat supports the built-in chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(pipe(prompt, max_new_tokens=200)[0]["generated_text"])

📊 Benchmark Results

All benchmarks were evaluated after fine-tuning, using greedy decoding unless otherwise noted.

MMLU — Elementary Mathematics

Metric	Value
Samples evaluated	50
Correct	15
Invalid outputs	4
Accuracy	30.00%
Random baseline (4-way)	25.00%

+5 pp above random. The model demonstrates marginal elementary math ability consistent with the small 1.1 B parameter count and an English instruction dataset that contains limited mathematical content.

HellaSwag (commonsense NLI)

Metric	Score	Samples
Accuracy	0.4550	200
Accuracy (normalised)	0.5600	200

Normalised accuracy above 0.50 indicates better-than-random commonsense sentence completion. HellaSwag is a strong proxy for general language understanding.

PIQA (physical intuition QA)

Metric	Score	Samples
Accuracy	0.7450	200
Accuracy (normalised)	0.7400	200

PIQA tests physical intuition and everyday procedural knowledge. 0.74 is a solid result for a 1.1 B model, suggesting the base pre-training retains good world knowledge even after instruction fine-tuning.

ARC Challenge (grade-school science)

Metric	Score	Samples
Accuracy	0.3050	200
Accuracy (normalised)	0.3500	200

ARC-Challenge targets questions that require reasoning beyond simple retrieval. 0.35 normalised reflects the model's limitations on multi-step reasoning at this scale.

Summary

Benchmark	Metric	Score
MMLU Elem. Math	Accuracy	30.00%
HellaSwag	Acc (norm)	56.00%
PIQA	Acc (norm)	74.00%
ARC Challenge	Acc (norm)	35.00%

📋 Training Details

Setting	Value
Base model	TinyLlama/TinyLlama-1.1B-Chat-v1.0
Dataset	tatsu-lab/alpaca
Train split	45,000 examples
Eval split	2,000 examples
Fine-tuning method	LoRA (PEFT)
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters	~17 M / 1.1 B (~1.55%)
Precision	float16 (AMP)
Epochs	3
Per-GPU batch size	4
Gradient accumulation	4 steps
Effective global batch	32 (4 × 2 GPUs × 4 accum)
Peak learning rate	2e-4
LR scheduler	Cosine annealing
Warmup ratio	3%
Gradient checkpointing	Enabled
NEFTune noise alpha	5
Hardware	Kaggle Dual T4 (2 × 16 GiB VRAM)
Loss masking	Completion-only (response tokens only)
Early stopping patience	3 evaluations

⚙️ Reproduce

python
# Install dependencies
# pip install transformers datasets peft trl accelerate bitsandbytes huggingface_hub

from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from datasets import load_dataset

# 1. Load dataset
dataset = load_dataset("tatsu-lab/alpaca", split="train")

# 2. Format examples
def format_alpaca(ex):
    input_section = f"### Input:\n{ex['input']}\n\n" if ex["input"].strip() else ""
    return {
        "text": (
            "Below is an instruction that describes a task. "
            "Write a response that appropriately completes the request.\n\n"
            f"### Instruction:\n{ex['instruction']}\n\n"
            f"{input_section}"
            f"### Response:\n{ex['output']}"
        )
    }

dataset = dataset.map(format_alpaca, batched=False)

# 3. Load model + LoRA
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    torch_dtype="auto",
    device_map={"": 0},
)
model.config.use_cache = False
model.enable_input_require_grads()

lora_config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    bias="none", task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
)
model = get_peft_model(model, lora_config)

# 4. Train
trainer = SFTTrainer(
    model=model, tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
    data_collator=DataCollatorForCompletionOnlyLM("### Response:\n", tokenizer=tokenizer),
    args=TrainingArguments(
        output_dir="./chatbot-lora",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        fp16=True,
        gradient_checkpointing=True,
        save_strategy="steps", save_steps=200, save_total_limit=3,
        eval_strategy="no",
    ),
)
trainer.train()

⚠️ Limitations

English only — the base model and Alpaca dataset are English-focused; other languages may produce incoherent outputs.
Hallucination — like all generative models, this one can confidently state incorrect facts. Always verify important claims.
Limited reasoning — at 1.1 B parameters, multi-step logical and mathematical reasoning is unreliable (see ARC / MMLU results above).
No RLHF safety alignment — this model has not undergone reinforcement learning from human feedback. It inherits TinyLlama's base alignment only and may produce inappropriate responses to adversarial prompts.
Short context — trained with a maximum sequence length of 512 tokens; very long conversations will be truncated.
Not production-ready — intended as a learning artefact and research baseline, not a deployed consumer product.

📜 License

This model is released under the Apache 2.0 license, consistent with the TinyLlama base model and the Alpaca dataset. See LICENSE for full terms.

Fine-tuned on Kaggle Dual T4 GPU · TRL SFTTrainer · LoRA via PEFT

tiny-chatbot

Get help setting up a custom Dedicated Endpoints.

README