Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Method
Ditto-8B is trained with DITTO, a reinforcement learning method that uses verbal feedback as the learning signal. After each output, the model receives descriptive feedback and produces an improved version; both are jointly optimized with GRPO. This distills the verbal guidance into the policy, so no feedback is needed at inference time.
Results
Primary metric for each benchmark (higher is better).
| Dim | Benchmark | GPT 5.5 | Gemini 3.1 Pro | Claude Opus 4.7 | Qwen 3.6 Plus | Others* | Qwen3 8B Inst | Ditto-8B |
|---|---|---|---|---|---|---|---|---|
| CONV | UserLLM | 65.3 | 67.7 | 57.6 | 72.1 | 44.6 | 46.0 | 91.5 |
| CONV | MirrorBench | 56.7 | 48.3 | 63.7 | 48.0 | 45.4 | 54.0 | 73.4 |
| CONV | Humanual-Chat | 28.2 | 21.0 | 22.6 | 22.2 | 25.8 | 24.7 | 21.0 |
| CONV | SimArena-Doc | 83.4 | 83.0 | 83.5 | 82.4 | 83.5 | 83.6 | 84.4 |
| SS | Sotopia-Hard | 31.9 | 27.8 | 32.4 | 28.3 | 31.7 | 27.7 | 45.8 |
| COG | Fantom | 93.0 | 93.0 | 80.0 | 89.0 | 70.0 | 23.0 | 92.0 |
| COG | Hitom | 82.0 | 86.0 | 93.0 | 73.0 | 56.0 | 62.0 | 79.0 |
| COG | Paratomi | 99.0 | 97.0 | 90.0 | 94.0 | 75.0 | 67.0 | 95.0 |
| COG | Social-R1 | 69.0 | 79.0 | 67.0 | 67.0 | 47.0 | 54.0 | 50.0 |
| ROLE | Coser | 66.2 | 62.1 | 66.5 | 55.9 | 30.3 | 43.5 | 64.4 |
| ROLE | Lifechoices | 91.0 | 84.0 | 92.0 | 79.0 | 67.0 | 70.0 | 70.0 |
| ROLE | Twinvoice | 74.0 | 86.0 | 83.0 | 71.0 | 40.0 | 42.0 | 71.0 |
| ROLE | BehaviorChain | 95.0 | 92.0 | 96.0 | 85.0 | 36.0 | 41.0 | 44.0 |
| ROLE | SimArena-Math | 68.5 | 71.5 | 68.7 | 70.9 | 70.5 | 68.9 | 69.6 |
| ROLE | Mistakes | 72.0 | 73.0 | 74.0 | 67.0 | 56.0 | 27.0 | 36.0 |
| ROLE | Humanual-Email | 50.1 | 46.9 | 50.4 | 47.9 | 42.8 | 43.7 | 40.8 |
| ROLE | Humanual-News | 40.2 | 42.3 | 41.3 | 41.8 | 33.1 | 32.5 | 27.5 |
| ROLE | Humanual-Politics | 42.0 | 32.5 | 43.5 | 31.6 | 34.2 | 33.2 | 29.7 |
| EVAL | AlignX | 71.2 | 73.4 | 71.6 | 69.8 | 66.8 | 68.6 | 67.4 |
| EVAL | Humanllm | 45.7 | 46.9 | 44.2 | 42.7 | 35.2 | 34.1 | 33.1 |
| EVAL | Socsci210 | 77.2 | 78.0 | 77.2 | 74.5 | 75.2 | 73.6 | 72.5 |
| EVAL | Humanual-Book | 57.6 | 62.4 | 61.4 | 58.4 | 50.2 | 53.6 | 53.4 |
| EVAL | Humanual-Opinion | 39.8 | 36.0 | 46.2 | 34.2 | 37.4 | 37.2 | 30.3 |
* Others: best result among other specialized human-simulation models (HumanLM-8B, Sotopia-RL-7B, UserLM-8B, Coser-8B).
Note. The released Ditto-8B is a single generalist distilled from a set of task-specific DITTO experts via rejection sampling on the training set.
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "sunweiwei/Ditto-8B"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")messages = [{"role": "user", "content": "Hello!"}]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(text, return_tensors="pt").to(model.device)outputs = model.generate(**inputs, max_new_tokens=512)print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Citation
bibtex
@article{sun2026ditto,title = {Reinforcing Human Behavior Simulation via Verbal Feedback},author = {Sun, Weiwei and Zhou, Xuhui and Liu, Jiarui and Du, Weihua and Sun, Haojia and Xie, Yiqing and Ma, Qianou and Chen, Sihao and Wan, Mengting and Yang, Longqi and Zhou, Pei and Wu, Sherry and Welleck, Sean and Neubig, Graham and Yang, Yiming and Sap, Maarten},year = {2026},eprint = {2605.20506},archivePrefix = {arXiv},url = {http://arxiv.org/abs/2605.20506}}
Model provider
sunweiwei
Model tree
Base
Qwen/Qwen3-VL-8B-Instruct
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information