amd
PARD2-Qwen3-14B
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitIntroduction
PARD is a high-performance speculative decoding method that also enables low-cost adaptation of autoregressive draft models into parallel draft models. PARD-2 further advances PARD by introducing a Target-Aligned Parallel Draft Model for dual-mode speculative decoding. Instead of optimizing draft models only for token-level prediction accuracy, PARD-2 aligns draft-model training with the inference-time objective of maximizing consecutive token acceptance. PARD-2 offers the following advantages:
-
Target-Aligned Optimization: PARD-2 reformulates the draft-model objective from next-token prediction accuracy to acceptance-length optimization, better matching the draft-then-verify process used during speculative decoding.
-
Confidence-Adaptive Token Optimization: PARD-2 introduces Confidence-Adaptive Token (CAT) optimization, which adaptively reweights tokens according to their contribution to the verification process. This improves the alignment between draft generation and target-model acceptance.
-
Dual-Mode Speculative Decoding: A single PARD-2 draft model supports both target-independent and target-dependent modes, combining the deployment flexibility of PARD with the stronger alignment capability of target-aware methods.
State-of-the-Art Performance: Across diverse models and tasks, PARD-2 achieves up to 6.94× lossless acceleration. On LLaMA3.1-8B, PARD-2 surpasses EAGLE-3 by 1.9× and PARD by 1.3×, setting a new performance frontier for speculative decoding.
Model Weights
| Model Series | Model Name | Download |
|---|---|---|
| Llama3 | amd/PARD2-Llama-3.1-8B | 🤗 HuggingFace |
| Qwen3 | amd/PARD2-Qwen3-8B | 🤗 HuggingFace |
| Qwen3 | amd/PARD2-Qwen3-14B | 🤗 HuggingFace |
How To Use
Please visit PARD2 repo for more information
Citation
markdown
@article{an2026pard,title={PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding},author={An, Zihao and Liu, Taichi and Liu, Ziqiong and Li, Dong and Liu, Ruofeng and Barsoum, Emad},journal={arXiv preprint arXiv:2605.08632},year={2026}}
Model provider
amd
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information