Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Project Purpose

This is an educational supervised fine-tuning experiment designed to learn the full LoRA/PEFT workflow:

  1. Load a pretrained instruction model
  2. Load an instruction-following dataset
  3. Convert the dataset into chat format
  4. Apply the Qwen chat template
  5. Add LoRA adapters
  6. Train with TRL SFTTrainer
  7. Save adapter weights
  8. Run inference with base model + adapter
  9. Upload adapter to the Hugging Face Hub

Base Model

  • Qwen/Qwen2.5-3B-Instruct

Dataset

  • Qwen/Qwen2.5-3B-Instruct
  • Training subset: 10000 examples
  • Evaluation subset: 1000 examples

Training Method

  • Method: LoRA
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Target modules: q_proj, v_proj
  • Max sequence length: 1024
  • Epochs: 2
  • Learning rate: 2e-4

Intended Use

This adapter is intended for instruction-following experiments.

Example use cases:

  • Learning PEFT/LoRA fine-tuning
  • Testing adapter loading
  • Comparing base model and LoRA-adapted model outputs

Comparision to base models

PROMPT: Explain overfitting to a beginner using a simple analogy.

BASE MODEL: Sure! Let's use the analogy of a child learning to ride a bicycle.

Imagine you're teaching a young child how to ride a bicycle. The goal is for the child to learn how to balance and steer without needing to hold onto the handlebars all the time. Overfitting can be thought of as the child trying too hard to remember every single detail of the path they've taken while riding, rather than learning the basic skills needed to stay balanced and steer properly.

Here’s how it works:

  1. Training the Bicycle: When you first start teaching the child to ride, you take them on smooth, easy paths with no obstacles. This is like training your model on a large dataset (lots of examples). If you give the child many practice rides in these smooth conditions, they will get really good at balancing and steering — they’ll remember every bump and dip along the way.

  2. Overfitting: Now imagine if you continue taking the child on smooth paths but also include


LORA MODEL: Imagine you are trying to predict the number of cookies that will be sold in a store each day. You create a model based on historical sales data and use it to make predictions. Your model is perfect - it can accurately predict the number of cookies sold every single day, even when the data changes. This is great! However, the problem is that your model is so complex and detailed that it has learned not only the patterns in the data but also the noise or random fluctuations. As a result, when you use your model to make predictions on new data, it performs poorly and makes incorrect predictions.

In other words, your model is "overfitting" to the training data. It has learned the patterns in the data so well that it can't generalize to new data. To avoid this, you need to keep your model simple and avoid overfitting.

This is a beginner fine-tuning experiment.

It should not be used for production, safety-critical applications, medical/legal/financial advice, or high-stakes decision-making.

The model may generate incorrect, biased, unsafe, or hallucinated outputs.

Example Usage

python

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = "Qwen/Qwen2.5-3B-Instruct"
adapter = "Kurapika993/qwen2.5-3b-lora-dolly15k"
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
def generate_response(model, tokenizer, user_prompt, max_new_tokens=200):
messages = [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": user_prompt
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(
text,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
)
generated_tokens = outputs[0][inputs["input_ids"].shape[-1]:]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
return response.strip()
prompt = "Explain machine learning to a beginner using a simple analogy."
response = generate_response(model, tokenizer, prompt)
print(response)

Model provider

Kurapika993

Kurapika993

Model tree

Base

Qwen/Qwen2.5-3B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today