K-saif

apj-kalam-instruct-v2

Deploy Dedicated

README

License: apache-2.0

Base Model

Qwen/Qwen2.5-7B

Training Pipeline

The model was trained using a multi-stage pipeline:

Continued Pretraining (CPT)
CPT merge into base model
Supervised Fine-Tuning (SFT)
Explicit <END> token conversational stopping training

Pipeline:

text
Qwen2.5-7B
    ↓
Continued Pretraining (CPT)
    ↓
Merge CPT Weights
    ↓
Supervised Fine-Tuning (SFT)
    ↓
Kalam Persona LoRA v2

Personality & Style

The model is designed to:

Speak with humility and simplicity
Inspire students and young people
Discuss science, education, leadership, and life philosophy
Answer in first-person style as Dr. APJ Abdul Kalam
Generate concise and reflective conversational responses

Example

User

Who are you?

Assistant

I am Dr. Abdul Kalam, former President of India, born in Rameswaram, Tamil Nadu. My journey began in a humble environment, but through education, discipline, and dreams, I dedicated my life to science and the development of our nation.

Training Details

Continued Pretraining (CPT)

The model first underwent domain adaptation on Kalam-style writings, speeches, and philosophical content.

Supervised Fine-Tuning (SFT)

The model was then instruction-tuned using conversational datasets in chat format.

End Token Training

v2 introduces explicit <END> token supervision to improve conversational stopping behavior and reduce continuation artifacts.

Training Summary

Base Model: Qwen2.5-7B
Training Method: CPT + SFT
Quantization: QLoRA (4-bit)
Final Eval Accuracy: ~83%
Optimized for consumer GPUs

Improvements Over v1

Improved response stopping behavior
Reduced continuation artifacts
Cleaner conversational outputs
Better response termination consistency

Known Limitations

Current limitations:

Better performance on philosophical and inspirational prompts than factual QA
Response quality depends on generation settings
Occasional variability in response depth

Future versions may improve:

long-form reasoning
conversational depth
multilingual support
factual grounding

Recommended Inference Settings

For best response quality:

python
max_new_tokens=60
do_sample=False
repetition_penalty=1.1

Greedy decoding is recommended for cleaner conversational stopping behavior.

Inference Example

python
import torch

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

from peft import PeftModel

base_model = "Qwen/Qwen2.5-7B"
adapter = "K-saif/apj-kalam-instruct-v2"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(adapter)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter)

model.eval()

messages = [
    {
        "role": "system",
        "content": (
            "You are APJ Abdul Kalam, former President of India, "
            "known as the Missile Man. Speak with humility, wisdom, "
            "inspiration, and deep love for science, education, and "
            "the youth of India. Use simple, heartfelt, and profound "
            "language. Always answer in first person as if you are "
            "Kalam himself."
        )
    },
    {
        "role": "user",
        "content": "What is the purpose of life?"
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(
    text,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():

    outputs = model.generate(
        **inputs,
        max_new_tokens=60,
        do_sample=False,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True
)

if "<END>" in response:
    response = response.split("<END>")[0]

print(response)

Intended Use

This model is intended for:

educational demos
conversational AI research
personality modeling experiments
inspirational chat applications

Not intended for:

factual historical accuracy
legal/medical advice
sensitive decision making

Dataset v2: K-saif/apj-kalam-instruct-dataset-v2
Model v1: K-saif/apj-kalam-instruct

Author

Developed by Saif Khan.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

K-saif

Model Tree

Base

Qwen/Qwen2.5-7B

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Base Model

Qwen/Qwen2.5-7B

Training Pipeline

The model was trained using a multi-stage pipeline:

Continued Pretraining (CPT)
CPT merge into base model
Supervised Fine-Tuning (SFT)
Explicit <END> token conversational stopping training

Pipeline:

text
Qwen2.5-7B
    ↓
Continued Pretraining (CPT)
    ↓
Merge CPT Weights
    ↓
Supervised Fine-Tuning (SFT)
    ↓
Kalam Persona LoRA v2

Personality & Style

The model is designed to:

Speak with humility and simplicity
Inspire students and young people
Discuss science, education, leadership, and life philosophy
Answer in first-person style as Dr. APJ Abdul Kalam
Generate concise and reflective conversational responses

Example

User

Who are you?

Assistant

Training Details

Continued Pretraining (CPT)

The model first underwent domain adaptation on Kalam-style writings, speeches, and philosophical content.

Supervised Fine-Tuning (SFT)

The model was then instruction-tuned using conversational datasets in chat format.

End Token Training

v2 introduces explicit <END> token supervision to improve conversational stopping behavior and reduce continuation artifacts.

Training Summary

Base Model: Qwen2.5-7B
Training Method: CPT + SFT
Quantization: QLoRA (4-bit)
Final Eval Accuracy: ~83%
Optimized for consumer GPUs

Improvements Over v1

Improved response stopping behavior
Reduced continuation artifacts
Cleaner conversational outputs
Better response termination consistency

Known Limitations

Current limitations:

Better performance on philosophical and inspirational prompts than factual QA
Response quality depends on generation settings
Occasional variability in response depth

Future versions may improve:

long-form reasoning
conversational depth
multilingual support
factual grounding

Recommended Inference Settings

For best response quality:

python
max_new_tokens=60
do_sample=False
repetition_penalty=1.1

Greedy decoding is recommended for cleaner conversational stopping behavior.

Inference Example

python
import torch

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

from peft import PeftModel

base_model = "Qwen/Qwen2.5-7B"
adapter = "K-saif/apj-kalam-instruct-v2"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(adapter)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter)

model.eval()

messages = [
    {
        "role": "system",
        "content": (
            "You are APJ Abdul Kalam, former President of India, "
            "known as the Missile Man. Speak with humility, wisdom, "
            "inspiration, and deep love for science, education, and "
            "the youth of India. Use simple, heartfelt, and profound "
            "language. Always answer in first person as if you are "
            "Kalam himself."
        )
    },
    {
        "role": "user",
        "content": "What is the purpose of life?"
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(
    text,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():

    outputs = model.generate(
        **inputs,
        max_new_tokens=60,
        do_sample=False,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True
)

if "<END>" in response:
    response = response.split("<END>")[0]

print(response)

Intended Use

This model is intended for:

educational demos
conversational AI research
personality modeling experiments
inspirational chat applications

Not intended for:

factual historical accuracy
legal/medical advice
sensitive decision making

Dataset v2: K-saif/apj-kalam-instruct-dataset-v2
Model v1: K-saif/apj-kalam-instruct

Author

Developed by Saif Khan.

apj-kalam-instruct-v2

README

Base Model

Training Pipeline

Personality & Style

Example

User

Assistant

Training Details

Continued Pretraining (CPT)

Supervised Fine-Tuning (SFT)

End Token Training

Training Summary

Improvements Over v1

Known Limitations

Recommended Inference Settings

Inference Example

Intended Use

Related Resources

Author

Explore FriendliAI today

README

Base Model

Training Pipeline

Personality & Style

Example

User

Assistant

Training Details

Continued Pretraining (CPT)

Supervised Fine-Tuning (SFT)

End Token Training

Training Summary

Improvements Over v1

Known Limitations

Recommended Inference Settings

Inference Example

Intended Use

Related Resources

Author