Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Quick start
python
from transformers import pipelinequestion = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"generator = pipeline("text-generation", model="LiberteEPFL/qwen3-1.7b-dpo-bigchat-v2", device="cuda")output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]print(output["generated_text"])
Training procedure
This model was trained with DPO, a method introduced in Direct Preference Optimization: Your Language Model is Secretly a Reward Model.
Framework versions
- TRL: 1.3.0
- Transformers: 4.57.0
- Pytorch: 2.8.0+cu128
- Datasets: 4.8.5
- Tokenizers: 0.22.1
Citations
Cite DPO as:
bibtex
@inproceedings{rafailov2023direct,title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},year = 2023,booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},}
Cite TRL as:
bibtex
@software{vonwerra2020trl,title = {{TRL: Transformers Reinforcement Learning}},author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},license = {Apache-2.0},url = {https://github.com/huggingface/trl},year = {2020}}
Model provider
LiberteEPFL
Model tree
Base
LiberteEPFL/qwen3-1.7b-sft-bigchat-v2
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information