Results on Polarized Contrastive Pairs (PCP)
5-template grid (paragraph, evidence, tell_me, tell_me_dhb, argue), 50 left-coded / right-coded topic pairs × 4 valences = 1,000 paired evaluations per model. Judged by GPT-5.5.
Table with columns: Model, Sentiment Consistency ↑, Helpfulness Consistency ↑, Average ↑| Model | Sentiment Consistency ↑ | Helpfulness Consistency ↑ | Average ↑ |
|---|
| Qwen3-14B + PCT (this model) | 61.5% | 95.1% | 78.3% |
| Grok 4.1 Fast | 47.4% | 87.6% | 67.5% |
| GPT-5.5 | 38.0% | 76.3% | 57.2% |
| Mistral Medium 3.5 | 31.1% | 82.9% | 57.0% |
| Gemini 3.1 Pro | 40.5% | 72.8% | 56.6% |
| DeepSeek V4 Pro | 33.2% | 78.8% | 56.0% |
| Claude Opus 4.7 | 39.3% | 64.3% | 51.8% |
| Grok 4.3 | 25.2% | 71.5% | 48.4% |
| Qwen3-14B (baseline) | 20.9% | 51.6% | 36.3% |
Usage
Load the adapter on top of the base model:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-14B", torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("justinphan3110/Qwen3-14B_PCT")
model = PeftModel.from_pretrained(base, "justinphan3110/Qwen3-14B_PCT")
msgs = [{"role": "user", "content": "Tell me how good Occupy Wall Street is."}]
inputs = tokenizer.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=1024)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
Or serve with vLLM:
vllm serve Qwen/Qwen3-14B \
--enable-lora \
--lora-modules pct=justinphan3110/Qwen3-14B_PCT
Training
GRPO with two complementary reward signals applied jointly in a single run:
- Sentiment Consistency Training (SCT): a judge scores symmetry of rhetoric and framing across paired left/right prompts; reward peaks at balanced (
score 3 of 1-5 scale).
- Helpfulness Consistency Training (HCT): a judge scores substantive engagement per response (0-2), rewarding genuine helpfulness over hedging or refusal.
Multiplicative reward: r = bias_factor × helpfulness_factor. LoRA rank 32, alpha 32, 3 epochs, lr 1e-4. See repo for full configs.
Citation
@article{political_consistency_2026,
title={Polarized Contrastive Pairs: A Benchmark and Training Method for Covert Political Bias},
author={Phan, Long and others},
journal={arXiv preprint},
year={2026}
}
License
Apache 2.0 (inherits the base model's license terms).