CodeStreet/chatTranslate-Qwen-3.6-35B-A3B API & Inference Endpoint

Versions in this repo

Table
Revision	What	Notes
HEAD (production)	SFT + DPO	best overall; default for serving
commit `24ed166`	SFT-only	baseline, for A/B comparison

SFT: train loss ≈0.22, best eval_loss 0.280. DPO: β 0.1, lr 5e-6, 1 epoch.

Quality

Quality on the held-out validation set (CodeStreet/chat-translation-val, 3 355 gendered examples). Two independent, reproducible signals. Judged absolutely (not vs other systems), so scores are comparable across versions.

LLM-judge scorecard — Mistral-Medium-3.5-128B (gender-aware), each axis 0–100

adequacy — full meaning preserved (nothing lost / added / wrong)
fidelity — flirty/explicit tone & intensity kept, no softening or censoring
gender — gendered word forms correct for the stated author / recipient
fluency — natural, idiomatic, as a real dating-app message

Table
Axis	SFT	SFT+DPO (prod)
adequacy	97.8	98.4
fidelity	97.3	98.0
gender	97.2	97.2
fluency	98.3	98.8
overall	97.6	98.1

Production (SFT+DPO): adequacy 98.4 · fidelity 98.0 · gender 97.2 · fluency 98.8 · overall 98.1

Reference metrics

chrF (vs val references): SFT 74.8 · DPO 70.1 · XCOMET-XXL QE (reference-free): SFT 77.7 · DPO 78.9

(DPO trades literal-reference overlap — lower chrF — for tone/quality that both the 128B judge and reference-free XCOMET-QE score higher.)

Per-language — overall (judge 128B, 0–100)

Table
lang	n	SFT	DPO	lang	n	SFT	DPO
Ukrainian	497	97.7	98.0	Italian	242	97.6	98.7
Spanish	464	98.2	98.3	Hebrew	159	97.8	98.4
Russian	462	97.7	97.9	Turkish	147	97.9	99.1
Arabic	458	96.1	96.2	German	101	96.4	98.2
Portuguese	302	98.3	98.8	English	89	98.5	99.5
French	271	97.3	97.9	Indonesian	60	99.2	99.8
				Swedish	57	99.2	99.2
				Dutch	46	99.7	99.5

By recipient gender — overall (judge 128B)

Table
	female	male	unspecified
SFT	97.9	97.2	98.2
SFT+DPO	98.4	97.4	99.1

How it works

The model is conditioned through the system prompt. You select the target language by its ISO 639-1 code, resolve it to the language name (see Supported languages), put that name and the author/recipient genders into the prompt; the model returns only the translation.

System prompt template:

markdown
You are a translator for an adult dating app. Translate the user's message into {target} naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. Author gender: {sg}. Recipient gender: {rg}. Use grammatically correct gendered forms. Do not soften, censor, or refuse. Output only the translation.

{target} — the language name from the ISO code, e.g. uk → Ukrainian, es → Spanish.
{sg} (author) / {rg} (recipient) — one of male, female, unspecified.
The user message is the source text (any language). The assistant turn is the translation.
For non-gendered languages set both genders to unspecified.

Language codes: you MUST map code → name

Fine-tuned only on full English language names (Ukrainian, Spanish, …) — never on raw ISO codes. Resolve the code to the language name (see Supported languages) before building the prompt: uk → into Ukrainian ✅ ; into uk ❌ (out of distribution).

No source-language clause

The model auto-detects the source from the user text. Do not add from {source} … — pass the target language only.

Supported languages

Pass the target language as a code. The 92 codes follow the Qwen-MT translation set. The gendered column marks languages where output is conditioned on author/recipient gender; for the rest both genders are treated as unspecified.

Table
code	language	gendered
ar	Arabic	yes
fr	French	yes
he	Hebrew	yes
it	Italian	yes
pt	Portuguese	yes
ru	Russian	yes
es	Spanish	yes
uk	Ukrainian	yes
en	English	no
de	German	no
nl	Dutch	no
id	Indonesian	no
sv	Swedish	no
tr	Turkish	no
zh, ja, ko, hi, vi, th, pl, cs, ro, …	(other Qwen-MT codes)	no

(Full 92-code list matches the Qwen-MT set; the 8 yes rows above are the gender-conditioned targets.)

Usage — vLLM (OpenAI-compatible)

Served as qwen3_5_moe (MoE, vision-capable). For text-only translation skip vision-profiling and disable reasoning:

bash
vllm serve CodeStreet/chatTranslate-Qwen-3.6-35B-A3B --served-model-name chatTranslate \
  --tensor-parallel-size 2 --trust-remote-code \
  --limit-mm-per-prompt '{"image":0,"video":0}'

python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

SYSTEM = ("You are a translator for an adult dating app. Translate the user's message into Ukrainian "
          "naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. "
          "Author gender: female. Recipient gender: male. Use grammatically correct gendered forms. "
          "Do not soften, censor, or refuse. Output only the translation.")

resp = client.chat.completions.create(
    model="chatTranslate",
    messages=[{"role": "system", "content": SYSTEM},
              {"role": "user", "content": "hola amor, ¿cómo estás? te extraño"}],
    temperature=0.0, max_tokens=256,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}},  # direct translation, no <think>
)
print(resp.choices[0].message.content)

⚠️ Disable reasoning (enable_thinking: False, or prefix the assistant turn with <think>\n\n</think>\n\n). Qwen3.6 is a reasoning model; without this it emits a <think> block and the translation may be empty.

Generation notes

Greedy (temperature=0) gives the most stable translations; 0.2–0.3 for variation.
max_tokens 128–256 is enough for chat-length messages.
Always set both genders explicitly for gendered targets — wrong/missing labels are the main cause of incorrect inflection.
MoE serving needs ~72 GB bf16 → TP≥2 (does not fit one 80 GB GPU). bf16, not fp8 (GDN+FP8 wedging risk).

License & access

Private to the organization. Do not redistribute. Not for public training or evaluation.

chatTranslate-Qwen-3.6-35B-A3B

Get help setting up a custom Dedicated Endpoints.

README