Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What this model does
Contemporary instruction-tuned LLMs (including Qwen 2.5) over-format their responses by default — producing bulleted lists, bold section headers, and templated structures even when flowing prose would serve the reader better. This adapter nudges the base model toward prose responses on contexts where prose is appropriate, while preserving the base model's ability to use structure where structure genuinely helps.
Quick start
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torchBASE = "Qwen/Qwen2.5-1.5B-Instruct"ADAPTER = "krishy-d/prosify_qwen_1.5b_lora"tokenizer = AutoTokenizer.from_pretrained(BASE)base_model = AutoModelForCausalLM.from_pretrained(BASE, dtype=torch.bfloat16, device_map="auto")model = PeftModel.from_pretrained(base_model, ADAPTER)# Generatemessages = [{"role": "user", "content": "Write me an email to my manager about WFH tomorrow."}]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(prompt, return_tensors="pt").to(model.device)with torch.no_grad():output = model.generate(**inputs, max_new_tokens=200, do_sample=False,pad_token_id=tokenizer.pad_token_id)print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training details
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Training method | DPO (Direct Preference Optimization) |
| Adapter | LoRA |
| Training data | FormatBench train split (~440 examples) |
| Validation | FormatBench val split (~50 examples) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Learning rate | 5e-5 |
| Epochs | 1 |
| β (KL leash) | 0.1 |
| Effective batch size | 4 |
| Precision | bfloat16 |
| Hardware | Kaggle T4 GPU (free tier) |
Full training notebook: notebooks/dpo_02_train.ipynb.
Evaluation
Structural metrics computed on the FormatBench held-out test split (49 examples, gold = prose) and the adversarial held-out set (40 examples, gold = structure).
Main test split (gold = prose)
| Metric | Base model | This model | Gold response |
|---|---|---|---|
| Bullets per response | 2.16 | 0.53 | 0.00 |
| Headers per response | 0.59 | 0.00 | 0.00 |
The trained adapter reduces bullet usage by 75% and eliminates markdown headers on prose-appropriate contexts.
Adversarial set (gold = structure)
| Metric | Base model | This model | Gold response |
|---|---|---|---|
| Bullets per response | 9.35 | 6.60 | 8.18 |
| Headers per response | 0.78 | 0.30 | 3.38 |
On contexts where structure is the correct answer (recipes, install instructions, comparisons, troubleshooting flows, reference lookups), the trained model preserves substantial structure but drifts below the gold level — indicating mild reward hacking where the model slightly over-generalizes the "prefer prose" preference.
Full evaluation notebook: notebooks/dpo_03_evaluate.ipynb.
Limitations
- v1 baseline: this is the first training run on a small dataset (591 examples) with conservative settings (LoRA rank 16, 1 epoch, β = 0.1). Higher capacity and tuned hyperparameters would likely close the adversarial gap.
- Single-author dataset voice: the FormatBench
chosenresponses were authored by one annotator, so the trained model inherits that voice as the "correct" prose style. - English only: training data is exclusively English.
- Mild reward hacking: the model uses less structure than appropriate on contexts where structure helps (~20% reduction below gold on adversarial bullets).
- Small base model: 1.5B parameters limits both fluency and the ceiling of what LoRA fine-tuning can achieve. v2 will explore 3B and 7B base models.
Roadmap
This adapter is v1. Planned for v2:
- Higher LoRA rank (64 or 128) for more capacity to shift generation behavior
- Tighter KL constraint (β = 0.3) to reduce adversarial structure drift
- Larger dataset (~2000 examples, multi-author)
- Larger base models (Qwen 2.5 3B and 7B)
Related artifacts
- Dataset: FormatBench
- HF Collection: Prosify
- Code & notebooks: github.com/krishyaid-coder/prosify
- Kaggle dataset: FormatBench on Kaggle
Citation
bibtex
@misc{prosify_qwen_1_5b_lora_2026,author = {Krishna Dahale},title = {Prosify-Qwen-1.5B-LoRA: A DPO-trained adapter for correcting LLM formatting bias},year = {2026},publisher = {Hugging Face},howpublished = {\url{https://huggingface.co/krishy-d/prosify_qwen_1.5b_lora}}}
License
Apache 2.0 (matching the base model's license).
Model provider
krishy-d
Model tree
Base
Qwen/Qwen2.5-1.5B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information