Intended use
Simulating the human/user side of conversations — user simulation for agent evaluation, social simulation, persona / role-play. Conditioned on a "social-context" system prompt (who is speaking: role, goal, background, style); given the other party's turns it generates the next human turn.
Results
Evaluated out-of-distribution as the user simulator in the τ-USI agentic benchmark (τ-bench airline+retail, 165 tasks, fixed GPT-5.2 agent), OSim-8B reaches USI 75.6 — the best behavioral / specialized user simulator, surpassing same-size general instruct models and every prior specialized simulator (CoSER-8B 67.2, UserLM-8B 62.0). It is distinctively human-like in reactivity (Sørensen–Dice D4 ≈ 93, matching the human inter-annotator level) and in outcome calibration (best ECE among compared models), with essentially none of the long-horizon agentic failure modes (timeouts/perseveration) seen in non-behavioral baselines.
Training
- Base: Qwen3-8B
- Stages: midtraining on the OdysSim corpus → task-specific reinforcement learning + expert consolidation.
Citation
If you use this model, please cite the OdysSim paper (Building Foundation Models for Human Behavior Simulation). Code: https://github.com/sunnweiwei/OdysSim