Kurapika993
qwen2.5-7b-qlora-no-robots
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Project Purpose
This is a supervised fine-tuning experiment for learning and demonstrating a 7B QLoRA workflow:
- Load a 7B instruction model in 4-bit
- Prepare the model for k-bit training
- Load and clean the No Robots instruction dataset
- Apply the Qwen chat template
- Add LoRA adapters
- Train with TRL
SFTTrainer - Save adapter weights
- Run inference with base model + adapter
- Upload adapter to Hugging Face Hub
Base Model
- Qwen/Qwen2.5-7B-Instruct
Dataset
- HuggingFaceH4/no_robots
- Training subset: 5000 examples
- Evaluation subset: 500 examples
Training Method
- Method: QLoRA
- Quantization: 4-bit NF4
- Double quantization: enabled
- Compute dtype: bfloat16
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules: "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",
- Max sequence length: 2048
- Epochs: 1
- Learning rate: 2e-4
- Effective batch size: 16
Evaluation
This adapter was evaluated qualitatively using fixed instruction-following prompts.
Included files:
training_config.jsonbase_vs_adapter_comparison.jsonloss_curve.png
This is a qualitative sanity check, not a formal benchmark.
Intended Use
This adapter is intended forinstruction-following and PEFT/QLoRA learning.
Example use cases:
- Testing PEFT adapter loading
- Comparing base model and QLoRA-adapted outputs
Example Usage
python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigfrom peft import PeftModelimport torchbase_model = "Qwen/Qwen2.5-7B-Instruct"adapter = "Kurapika993/qwen2.5-7b-qlora-no-robots"bnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_use_double_quant=True,)tokenizer = AutoTokenizer.from_pretrained(adapter)model = AutoModelForCausalLM.from_pretrained(base_model,quantization_config=bnb_config,device_map="auto",trust_remote_code=True,)model = PeftModel.from_pretrained(model, adapter)model.eval()def generate_response(model, tokenizer, user_prompt, max_new_tokens=250):messages = [{"role": "system","content": "You are a helpful assistant."},{"role": "user","content": user_prompt}]text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)inputs = tokenizer(text,return_tensors="pt").to(model.device)with torch.no_grad():outputs = model.generate(**inputs,max_new_tokens=max_new_tokens,do_sample=True,temperature=0.7,top_p=0.9,repetition_penalty=1.05,pad_token_id=tokenizer.eos_token_id,)generated_tokens = outputs[0][inputs["input_ids"].shape[-1]:]response = tokenizer.decode(generated_tokens, skip_special_tokens=True)return response.strip()prompt = "Explain instruction tuning to a beginner using a simple analogy."response = generate_response(model,tokenizer,prompt,max_new_tokens=250)print(response
Model provider
Kurapika993
Model tree
Base
Qwen/Qwen2.5-7B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information