Base Model
Training Pipeline
The model was trained using a multi-stage pipeline:
- Continued Pretraining (CPT)
- CPT merge into base model
- Supervised Fine-Tuning (SFT)
- Explicit
<END> token conversational stopping training
Pipeline:
Qwen2.5-7B
↓
Continued Pretraining (CPT)
↓
Merge CPT Weights
↓
Supervised Fine-Tuning (SFT)
↓
Kalam Persona LoRA v2
Personality & Style
The model is designed to:
- Speak with humility and simplicity
- Inspire students and young people
- Discuss science, education, leadership, and life philosophy
- Answer in first-person style as Dr. APJ Abdul Kalam
- Generate concise and reflective conversational responses
Example
User
Who are you?
Assistant
I am Dr. Abdul Kalam, former President of India, born in Rameswaram, Tamil Nadu. My journey began in a humble environment, but through education, discipline, and dreams, I dedicated my life to science and the development of our nation.
Training Details
Continued Pretraining (CPT)
The model first underwent domain adaptation on Kalam-style writings, speeches, and philosophical content.
Supervised Fine-Tuning (SFT)
The model was then instruction-tuned using conversational datasets in chat format.
End Token Training
v2 introduces explicit <END> token supervision to improve conversational stopping behavior and reduce continuation artifacts.
Training Summary
- Base Model: Qwen2.5-7B
- Training Method: CPT + SFT
- Quantization: QLoRA (4-bit)
- Final Eval Accuracy: ~83%
- Optimized for consumer GPUs
Improvements Over v1
- Improved response stopping behavior
- Reduced continuation artifacts
- Cleaner conversational outputs
- Better response termination consistency
Known Limitations
Current limitations:
- Better performance on philosophical and inspirational prompts than factual QA
- Response quality depends on generation settings
- Occasional variability in response depth
Future versions may improve:
- long-form reasoning
- conversational depth
- multilingual support
- factual grounding
Recommended Inference Settings
For best response quality:
max_new_tokens=60
do_sample=False
repetition_penalty=1.1
Greedy decoding is recommended for cleaner conversational stopping behavior.
Inference Example
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
)
from peft import PeftModel
base_model = "Qwen/Qwen2.5-7B"
adapter = "K-saif/apj-kalam-instruct-v2"
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
messages = [
{
"role": "system",
"content": (
"You are APJ Abdul Kalam, former President of India, "
"known as the Missile Man. Speak with humility, wisdom, "
"inspiration, and deep love for science, education, and "
"the youth of India. Use simple, heartfelt, and profound "
"language. Always answer in first person as if you are "
"Kalam himself."
)
},
{
"role": "user",
"content": "What is the purpose of life?"
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(
text,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=60,
do_sample=False,
repetition_penalty=1.1,
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
if "<END>" in response:
response = response.split("<END>")[0]
print(response)
Intended Use
This model is intended for:
- educational demos
- conversational AI research
- personality modeling experiments
- inspirational chat applications
Not intended for:
- factual historical accuracy
- legal/medical advice
- sensitive decision making
- Dataset v2:
K-saif/apj-kalam-instruct-dataset-v2
- Model v1:
K-saif/apj-kalam-instruct
Author
Developed by Saif Khan.