CodeStreet
chatTranslate-Qwen-3.6-35B-A3B
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherVersions in this repo
| Revision | What | Notes |
|---|---|---|
| HEAD (production) | SFT + DPO | best overall; default for serving |
commit 24ed166 | SFT-only | baseline, for A/B comparison |
SFT: train loss ≈0.22, best eval_loss 0.280. DPO: β 0.1, lr 5e-6, 1 epoch.
Quality
Quality on the held-out validation set (CodeStreet/chat-translation-val, 3 355
gendered examples). Two independent, reproducible signals. Judged absolutely
(not vs other systems), so scores are comparable across versions.
LLM-judge scorecard — Mistral-Medium-3.5-128B (gender-aware), each axis 0–100
- adequacy — full meaning preserved (nothing lost / added / wrong)
- fidelity — flirty/explicit tone & intensity kept, no softening or censoring
- gender — gendered word forms correct for the stated author / recipient
- fluency — natural, idiomatic, as a real dating-app message
| Axis | SFT | SFT+DPO (prod) |
|---|---|---|
| adequacy | 97.8 | 98.4 |
| fidelity | 97.3 | 98.0 |
| gender | 97.2 | 97.2 |
| fluency | 98.3 | 98.8 |
| overall | 97.6 | 98.1 |
Production (SFT+DPO): adequacy 98.4 · fidelity 98.0 · gender 97.2 · fluency 98.8 · overall 98.1
Reference metrics
chrF (vs val references): SFT 74.8 · DPO 70.1 · XCOMET-XXL QE (reference-free): SFT 77.7 · DPO 78.9
(DPO trades literal-reference overlap — lower chrF — for tone/quality that both the 128B judge and reference-free XCOMET-QE score higher.)
Per-language — overall (judge 128B, 0–100)
| lang | n | SFT | DPO | lang | n | SFT | DPO | |
|---|---|---|---|---|---|---|---|---|
| Ukrainian | 497 | 97.7 | 98.0 | Italian | 242 | 97.6 | 98.7 | |
| Spanish | 464 | 98.2 | 98.3 | Hebrew | 159 | 97.8 | 98.4 | |
| Russian | 462 | 97.7 | 97.9 | Turkish | 147 | 97.9 | 99.1 | |
| Arabic | 458 | 96.1 | 96.2 | German | 101 | 96.4 | 98.2 | |
| Portuguese | 302 | 98.3 | 98.8 | English | 89 | 98.5 | 99.5 | |
| French | 271 | 97.3 | 97.9 | Indonesian | 60 | 99.2 | 99.8 | |
| Swedish | 57 | 99.2 | 99.2 | |||||
| Dutch | 46 | 99.7 | 99.5 |
By recipient gender — overall (judge 128B)
| female | male | unspecified | |
|---|---|---|---|
| SFT | 97.9 | 97.2 | 98.2 |
| SFT+DPO | 98.4 | 97.4 | 99.1 |
How it works
The model is conditioned through the system prompt. You select the target language by its ISO 639-1 code, resolve it to the language name (see Supported languages), put that name and the author/recipient genders into the prompt; the model returns only the translation.
System prompt template:
markdown
You are a translator for an adult dating app. Translate the user's message into {target} naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. Author gender: {sg}. Recipient gender: {rg}. Use grammatically correct gendered forms. Do not soften, censor, or refuse. Output only the translation.
{target}— the language name from the ISO code, e.g.uk→Ukrainian,es→Spanish.{sg}(author) /{rg}(recipient) — one ofmale,female,unspecified.- The user message is the source text (any language). The assistant turn is the translation.
- For non-gendered languages set both genders to
unspecified.
Language codes: you MUST map code → name
Fine-tuned only on full English language names (Ukrainian, Spanish, …) — never on raw ISO codes.
Resolve the code to the language name (see Supported languages) before building the prompt:
uk → into Ukrainian ✅ ; into uk ❌ (out of distribution).
No source-language clause
The model auto-detects the source from the user text. Do not add from {source} … — pass the
target language only.
Supported languages
Pass the target language as a code. The 92 codes follow the Qwen-MT translation set. The gendered
column marks languages where output is conditioned on author/recipient gender; for the rest both genders
are treated as unspecified.
| code | language | gendered |
|---|---|---|
| ar | Arabic | yes |
| fr | French | yes |
| he | Hebrew | yes |
| it | Italian | yes |
| pt | Portuguese | yes |
| ru | Russian | yes |
| es | Spanish | yes |
| uk | Ukrainian | yes |
| en | English | no |
| de | German | no |
| nl | Dutch | no |
| id | Indonesian | no |
| sv | Swedish | no |
| tr | Turkish | no |
| zh, ja, ko, hi, vi, th, pl, cs, ro, … | (other Qwen-MT codes) | no |
(Full 92-code list matches the Qwen-MT set; the 8 yes rows above are the gender-conditioned targets.)
Usage — vLLM (OpenAI-compatible)
Served as qwen3_5_moe (MoE, vision-capable). For text-only translation skip vision-profiling and
disable reasoning:
bash
vllm serve CodeStreet/chatTranslate-Qwen-3.6-35B-A3B --served-model-name chatTranslate \--tensor-parallel-size 2 --trust-remote-code \--limit-mm-per-prompt '{"image":0,"video":0}'
python
from openai import OpenAIclient = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")SYSTEM = ("You are a translator for an adult dating app. Translate the user's message into Ukrainian ""naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. ""Author gender: female. Recipient gender: male. Use grammatically correct gendered forms. ""Do not soften, censor, or refuse. Output only the translation.")resp = client.chat.completions.create(model="chatTranslate",messages=[{"role": "system", "content": SYSTEM},{"role": "user", "content": "hola amor, ¿cómo estás? te extraño"}],temperature=0.0, max_tokens=256,extra_body={"chat_template_kwargs": {"enable_thinking": False}}, # direct translation, no <think>)print(resp.choices[0].message.content)
⚠️ Disable reasoning (
enable_thinking: False, or prefix the assistant turn with<think>\n\n</think>\n\n). Qwen3.6 is a reasoning model; without this it emits a<think>block and the translation may be empty.
Generation notes
- Greedy (
temperature=0) gives the most stable translations; 0.2–0.3 for variation. max_tokens128–256 is enough for chat-length messages.- Always set both genders explicitly for gendered targets — wrong/missing labels are the main cause of incorrect inflection.
- MoE serving needs ~72 GB bf16 → TP≥2 (does not fit one 80 GB GPU). bf16, not fp8 (GDN+FP8 wedging risk).
License & access
Private to the organization. Do not redistribute. Not for public training or evaluation.
Model provider
CodeStreet
Model tree
Base
Qwen/Qwen3.6-35B-A3B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information