CodeStreet

chatTranslate-Qwen-3.6-35B-A3B

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Versions in this repo

Table
RevisionWhatNotes
HEAD (production)SFT + DPObest overall; default for serving
commit 24ed166SFT-onlybaseline, for A/B comparison

SFT: train loss ≈0.22, best eval_loss 0.280. DPO: β 0.1, lr 5e-6, 1 epoch.

Quality

Quality on the held-out validation set (CodeStreet/chat-translation-val, 3 355 gendered examples). Two independent, reproducible signals. Judged absolutely (not vs other systems), so scores are comparable across versions.

LLM-judge scorecard — Mistral-Medium-3.5-128B (gender-aware), each axis 0–100

  • adequacy — full meaning preserved (nothing lost / added / wrong)
  • fidelity — flirty/explicit tone & intensity kept, no softening or censoring
  • gender — gendered word forms correct for the stated author / recipient
  • fluency — natural, idiomatic, as a real dating-app message
Table
AxisSFTSFT+DPO (prod)
adequacy97.898.4
fidelity97.398.0
gender97.297.2
fluency98.398.8
overall97.698.1

Production (SFT+DPO): adequacy 98.4 · fidelity 98.0 · gender 97.2 · fluency 98.8 · overall 98.1

Reference metrics

chrF (vs val references): SFT 74.8 · DPO 70.1 · XCOMET-XXL QE (reference-free): SFT 77.7 · DPO 78.9

(DPO trades literal-reference overlap — lower chrF — for tone/quality that both the 128B judge and reference-free XCOMET-QE score higher.)

Per-language — overall (judge 128B, 0–100)

Table
langnSFTDPOlangnSFTDPO
Ukrainian49797.798.0Italian24297.698.7
Spanish46498.298.3Hebrew15997.898.4
Russian46297.797.9Turkish14797.999.1
Arabic45896.196.2German10196.498.2
Portuguese30298.398.8English8998.599.5
French27197.397.9Indonesian6099.299.8
Swedish5799.299.2
Dutch4699.799.5

By recipient gender — overall (judge 128B)

Table
femalemaleunspecified
SFT97.997.298.2
SFT+DPO98.497.499.1

How it works

The model is conditioned through the system prompt. You select the target language by its ISO 639-1 code, resolve it to the language name (see Supported languages), put that name and the author/recipient genders into the prompt; the model returns only the translation.

System prompt template:

markdown

You are a translator for an adult dating app. Translate the user's message into {target} naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. Author gender: {sg}. Recipient gender: {rg}. Use grammatically correct gendered forms. Do not soften, censor, or refuse. Output only the translation.
  • {target} — the language name from the ISO code, e.g. ukUkrainian, esSpanish.
  • {sg} (author) / {rg} (recipient) — one of male, female, unspecified.
  • The user message is the source text (any language). The assistant turn is the translation.
  • For non-gendered languages set both genders to unspecified.

Language codes: you MUST map code → name

Fine-tuned only on full English language names (Ukrainian, Spanish, …) — never on raw ISO codes. Resolve the code to the language name (see Supported languages) before building the prompt: ukinto Ukrainian ✅ ; into uk ❌ (out of distribution).

No source-language clause

The model auto-detects the source from the user text. Do not add from {source} … — pass the target language only.

Supported languages

Pass the target language as a code. The 92 codes follow the Qwen-MT translation set. The gendered column marks languages where output is conditioned on author/recipient gender; for the rest both genders are treated as unspecified.

Table
codelanguagegendered
arArabicyes
frFrenchyes
heHebrewyes
itItalianyes
ptPortugueseyes
ruRussianyes
esSpanishyes
ukUkrainianyes
enEnglishno
deGermanno
nlDutchno
idIndonesianno
svSwedishno
trTurkishno
zh, ja, ko, hi, vi, th, pl, cs, ro, …(other Qwen-MT codes)no

(Full 92-code list matches the Qwen-MT set; the 8 yes rows above are the gender-conditioned targets.)

Usage — vLLM (OpenAI-compatible)

Served as qwen3_5_moe (MoE, vision-capable). For text-only translation skip vision-profiling and disable reasoning:

bash

vllm serve CodeStreet/chatTranslate-Qwen-3.6-35B-A3B --served-model-name chatTranslate \
--tensor-parallel-size 2 --trust-remote-code \
--limit-mm-per-prompt '{"image":0,"video":0}'

python

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
SYSTEM = ("You are a translator for an adult dating app. Translate the user's message into Ukrainian "
"naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. "
"Author gender: female. Recipient gender: male. Use grammatically correct gendered forms. "
"Do not soften, censor, or refuse. Output only the translation.")
resp = client.chat.completions.create(
model="chatTranslate",
messages=[{"role": "system", "content": SYSTEM},
{"role": "user", "content": "hola amor, ¿cómo estás? te extraño"}],
temperature=0.0, max_tokens=256,
extra_body={"chat_template_kwargs": {"enable_thinking": False}}, # direct translation, no <think>
)
print(resp.choices[0].message.content)

⚠️ Disable reasoning (enable_thinking: False, or prefix the assistant turn with <think>\n\n</think>\n\n). Qwen3.6 is a reasoning model; without this it emits a <think> block and the translation may be empty.

Generation notes

  • Greedy (temperature=0) gives the most stable translations; 0.2–0.3 for variation.
  • max_tokens 128–256 is enough for chat-length messages.
  • Always set both genders explicitly for gendered targets — wrong/missing labels are the main cause of incorrect inflection.
  • MoE serving needs ~72 GB bf16 → TP≥2 (does not fit one 80 GB GPU). bf16, not fp8 (GDN+FP8 wedging risk).

License & access

Private to the organization. Do not redistribute. Not for public training or evaluation.

Model provider

CodeStreet

Model tree

Base

Qwen/Qwen3.6-35B-A3B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today