Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What this model does

Contemporary instruction-tuned LLMs (including Qwen 2.5) over-format their responses by default — producing bulleted lists, bold section headers, and templated structures even when flowing prose would serve the reader better. This adapter nudges the base model toward prose responses on contexts where prose is appropriate, while preserving the base model's ability to use structure where structure genuinely helps.

Quick start

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
BASE = "Qwen/Qwen2.5-1.5B-Instruct"
ADAPTER = "krishy-d/prosify_qwen_1.5b_lora"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
BASE, dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
# Generate
messages = [{"role": "user", "content": "Write me an email to my manager about WFH tomorrow."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=200, do_sample=False,
pad_token_id=tokenizer.pad_token_id)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training details

Base modelQwen/Qwen2.5-1.5B-Instruct
Training methodDPO (Direct Preference Optimization)
AdapterLoRA
Training dataFormatBench train split (~440 examples)
ValidationFormatBench val split (~50 examples)
LoRA rank16
LoRA alpha32
Target modulesq_proj, k_proj, v_proj, o_proj
Learning rate5e-5
Epochs1
β (KL leash)0.1
Effective batch size4
Precisionbfloat16
HardwareKaggle T4 GPU (free tier)

Full training notebook: notebooks/dpo_02_train.ipynb.

Evaluation

Structural metrics computed on the FormatBench held-out test split (49 examples, gold = prose) and the adversarial held-out set (40 examples, gold = structure).

Main test split (gold = prose)

MetricBase modelThis modelGold response
Bullets per response2.160.530.00
Headers per response0.590.000.00

The trained adapter reduces bullet usage by 75% and eliminates markdown headers on prose-appropriate contexts.

Adversarial set (gold = structure)

MetricBase modelThis modelGold response
Bullets per response9.356.608.18
Headers per response0.780.303.38

On contexts where structure is the correct answer (recipes, install instructions, comparisons, troubleshooting flows, reference lookups), the trained model preserves substantial structure but drifts below the gold level — indicating mild reward hacking where the model slightly over-generalizes the "prefer prose" preference.

Full evaluation notebook: notebooks/dpo_03_evaluate.ipynb.

Limitations

  • v1 baseline: this is the first training run on a small dataset (591 examples) with conservative settings (LoRA rank 16, 1 epoch, β = 0.1). Higher capacity and tuned hyperparameters would likely close the adversarial gap.
  • Single-author dataset voice: the FormatBench chosen responses were authored by one annotator, so the trained model inherits that voice as the "correct" prose style.
  • English only: training data is exclusively English.
  • Mild reward hacking: the model uses less structure than appropriate on contexts where structure helps (~20% reduction below gold on adversarial bullets).
  • Small base model: 1.5B parameters limits both fluency and the ceiling of what LoRA fine-tuning can achieve. v2 will explore 3B and 7B base models.

Roadmap

This adapter is v1. Planned for v2:

  • Higher LoRA rank (64 or 128) for more capacity to shift generation behavior
  • Tighter KL constraint (β = 0.3) to reduce adversarial structure drift
  • Larger dataset (~2000 examples, multi-author)
  • Larger base models (Qwen 2.5 3B and 7B)

Related artifacts

Citation

bibtex

@misc{prosify_qwen_1_5b_lora_2026,
author = {Krishna Dahale},
title = {Prosify-Qwen-1.5B-LoRA: A DPO-trained adapter for correcting LLM formatting bias},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/krishy-d/prosify_qwen_1.5b_lora}}
}

License

Apache 2.0 (matching the base model's license).

Model provider

krishy-d

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today