Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

🚀 Quick Start

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"Havoc999/tiny-chatbot",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Havoc999/tiny-chatbot")
prompt = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n"
"Explain the water cycle in simple terms.\n\n"
"### Response:\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.15,
)
response = tokenizer.decode(output[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Multi-turn (Chat Template)

python

from transformers import pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
{"role": "user", "content": "What is photosynthesis?"},
]
# TinyLlama-Chat supports the built-in chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(pipe(prompt, max_new_tokens=200)[0]["generated_text"])

📊 Benchmark Results

All benchmarks were evaluated after fine-tuning, using greedy decoding unless otherwise noted.

MMLU — Elementary Mathematics

MetricValue
Samples evaluated50
Correct15
Invalid outputs4
Accuracy30.00%
Random baseline (4-way)25.00%

+5 pp above random. The model demonstrates marginal elementary math ability consistent with the small 1.1 B parameter count and an English instruction dataset that contains limited mathematical content.


HellaSwag (commonsense NLI)

MetricScoreSamples
Accuracy0.4550200
Accuracy (normalised)0.5600200

Normalised accuracy above 0.50 indicates better-than-random commonsense sentence completion. HellaSwag is a strong proxy for general language understanding.


PIQA (physical intuition QA)

MetricScoreSamples
Accuracy0.7450200
Accuracy (normalised)0.7400200

PIQA tests physical intuition and everyday procedural knowledge. 0.74 is a solid result for a 1.1 B model, suggesting the base pre-training retains good world knowledge even after instruction fine-tuning.


ARC Challenge (grade-school science)

MetricScoreSamples
Accuracy0.3050200
Accuracy (normalised)0.3500200

ARC-Challenge targets questions that require reasoning beyond simple retrieval. 0.35 normalised reflects the model's limitations on multi-step reasoning at this scale.


Summary

BenchmarkMetricScore
MMLU Elem. MathAccuracy30.00%
HellaSwagAcc (norm)56.00%
PIQAAcc (norm)74.00%
ARC ChallengeAcc (norm)35.00%

📋 Training Details

SettingValue
Base modelTinyLlama/TinyLlama-1.1B-Chat-v1.0
Datasettatsu-lab/alpaca
Train split45,000 examples
Eval split2,000 examples
Fine-tuning methodLoRA (PEFT)
LoRA rank16
LoRA alpha32
LoRA dropout0.05
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters~17 M / 1.1 B (~1.55%)
Precisionfloat16 (AMP)
Epochs3
Per-GPU batch size4
Gradient accumulation4 steps
Effective global batch32 (4 × 2 GPUs × 4 accum)
Peak learning rate2e-4
LR schedulerCosine annealing
Warmup ratio3%
Gradient checkpointingEnabled
NEFTune noise alpha5
HardwareKaggle Dual T4 (2 × 16 GiB VRAM)
Loss maskingCompletion-only (response tokens only)
Early stopping patience3 evaluations

⚙️ Reproduce

python

# Install dependencies
# pip install transformers datasets peft trl accelerate bitsandbytes huggingface_hub
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from datasets import load_dataset
# 1. Load dataset
dataset = load_dataset("tatsu-lab/alpaca", split="train")
# 2. Format examples
def format_alpaca(ex):
input_section = f"### Input:\n{ex['input']}\n\n" if ex["input"].strip() else ""
return {
"text": (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
f"### Instruction:\n{ex['instruction']}\n\n"
f"{input_section}"
f"### Response:\n{ex['output']}"
)
}
dataset = dataset.map(format_alpaca, batched=False)
# 3. Load model + LoRA
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
torch_dtype="auto",
device_map={"": 0},
)
model.config.use_cache = False
model.enable_input_require_grads()
lora_config = LoraConfig(
r=16, lora_alpha=32, lora_dropout=0.05,
bias="none", task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
)
model = get_peft_model(model, lora_config)
# 4. Train
trainer = SFTTrainer(
model=model, tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=512,
data_collator=DataCollatorForCompletionOnlyLM("### Response:\n", tokenizer=tokenizer),
args=TrainingArguments(
output_dir="./chatbot-lora",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
fp16=True,
gradient_checkpointing=True,
save_strategy="steps", save_steps=200, save_total_limit=3,
eval_strategy="no",
),
)
trainer.train()

⚠️ Limitations

  • English only — the base model and Alpaca dataset are English-focused; other languages may produce incoherent outputs.
  • Hallucination — like all generative models, this one can confidently state incorrect facts. Always verify important claims.
  • Limited reasoning — at 1.1 B parameters, multi-step logical and mathematical reasoning is unreliable (see ARC / MMLU results above).
  • No RLHF safety alignment — this model has not undergone reinforcement learning from human feedback. It inherits TinyLlama's base alignment only and may produce inappropriate responses to adversarial prompts.
  • Short context — trained with a maximum sequence length of 512 tokens; very long conversations will be truncated.
  • Not production-ready — intended as a learning artefact and research baseline, not a deployed consumer product.

📜 License

This model is released under the Apache 2.0 license, consistent with the TinyLlama base model and the Alpaca dataset. See LICENSE for full terms.


Fine-tuned on Kaggle Dual T4 GPU · TRL SFTTrainer · LoRA via PEFT

Model provider

Havoc999

Model tree

Base

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today