Qwen3.6-27B-JudgeOPSD-0604 API & Inference Endpoint

Overview

This model is trained to serve as a general-purpose evaluation judge that scores responses based on user-specified rubrics. It supports arbitrary input formats — you only need to specify the desired output format in your prompt.

Key Features:

Multi-dimensional rubric-based scoring
Flexible input: any QA pair + custom rubric
Structured JSON output with per-criterion scores and reasoning
Trained on diverse evaluation datasets via online self-distillation

Usage

With Transformers + PEFT

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.6-27B",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-27B")

prompt = """你是专业评分法官，按rubric对QA多维度打分，输出严格JSON格式，不要多余内容。
问题：长方形周长48，最大面积是多少？
待评回答：周长48，长+宽=24，最大面积144，正方形时最大。
评分维度：1.答案正确性(权重0.8) 2.公式使用(权重0.15) 3.逻辑完整性(权重0.05)
输出格式：{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)

With vLLM (Recommended for Production)

python
from vllm import LLM, SamplingParams

llm = LLM(
    model="Qwen/Qwen3.6-27B",
    enable_lora=True,
    max_lora_rank=64,
    max_model_len=4096,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)

from vllm.lora.request import LoRARequest
lora_request = LoRARequest("judge", 1, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")

prompt = """你是专业评分法官，按rubric对QA多维度打分，输出严格JSON格式，不要多余内容。
问题：什么是光合作用？
待评回答：光合作用是植物利用阳光将二氧化碳和水转化为葡萄糖和氧气的过程。
评分维度：1.准确性(权重0.6) 2.完整性(权重0.3) 3.表达清晰度(权重0.1)
输出格式：{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""

outputs = llm.generate(prompt, sampling_params, lora_request=lora_request)
print(outputs[0].outputs[0].text)

Prompt Format

The model is flexible with input format. A typical prompt structure:

markdown
你是专业评分法官，按rubric对QA多维度打分，输出严格JSON格式，不要多余内容。
问题：{question}
待评回答：{answer}
评分维度：{rubric_dimensions}
输出格式：{desired_json_schema}

Expected Output Example:

json
{
  "score": 0.85,
  "item_detail": [
    {"criterion": "答案正确性", "single_score": 0.9, "weight": 0.8, "reason": "答案正确，正方形时面积最大为144"},
    {"criterion": "公式使用", "single_score": 0.8, "weight": 0.15, "reason": "使用了周长公式但未明确写出"},
    {"criterion": "逻辑完整性", "single_score": 0.7, "weight": 0.05, "reason": "推理步骤较简略"}
  ],
  "total_reason": "回答正确且核心推理完整，但公式展示和推理步骤可更详细"
}

Training Details

Table with columns: Hyperparameter, Value
Hyperparameter	Value
Method	LoRA + OPSD (Online Policy Self-Distillation)
LoRA Rank	64
LoRA Alpha	128
Learning Rate	1e-5
Epochs	1
Batch Size	1 × 8 (grad accum) × 8 GPUs = effective 64
Max Sequence Length	4096
Max Completion Length	2048
Temperature	1.0

Training Data

A mixture of 4 evaluation/feedback datasets:

Limitations

Optimized for rubric-based scoring tasks; may not generalize well to open-ended generation
Best performance with structured output prompts specifying JSON format
Score calibration may vary across different rubric scales

Citation

If you find this model useful, please cite:

bibtex
@misc{qwen36-judgeopsd-0604,
  title={Qwen3.6-27B-JudgeOPSD-0604},
  author={Uranus},
  year={2026},
  url={https://huggingface.co/Uranus/Qwen3.6-27B-JudgeOPSD-0604}
}

License

This model inherits the Apache 2.0 License from the base model.

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.6-27B",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-27B")

prompt = """你是专业评分法官，按rubric对QA多维度打分，输出严格JSON格式，不要多余内容。
问题：长方形周长48，最大面积是多少？
待评回答：周长48，长+宽=24，最大面积144，正方形时最大。
评分维度：1.答案正确性(权重0.8) 2.公式使用(权重0.15) 3.逻辑完整性(权重0.05)
输出格式：{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)

python

from vllm import LLM, SamplingParams

llm = LLM(
    model="Qwen/Qwen3.6-27B",
    enable_lora=True,
    max_lora_rank=64,
    max_model_len=4096,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)

from vllm.lora.request import LoRARequest
lora_request = LoRARequest("judge", 1, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")

prompt = """你是专业评分法官，按rubric对QA多维度打分，输出严格JSON格式，不要多余内容。
问题：什么是光合作用？
待评回答：光合作用是植物利用阳光将二氧化碳和水转化为葡萄糖和氧气的过程。
评分维度：1.准确性(权重0.6) 2.完整性(权重0.3) 3.表达清晰度(权重0.1)
输出格式：{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""

outputs = llm.generate(prompt, sampling_params, lora_request=lora_request)
print(outputs[0].outputs[0].text)

markdown

你是专业评分法官，按rubric对QA多维度打分，输出严格JSON格式，不要多余内容。
问题：{question}
待评回答：{answer}
评分维度：{rubric_dimensions}
输出格式：{desired_json_schema}

json

{
  "score": 0.85,
  "item_detail": [
    {"criterion": "答案正确性", "single_score": 0.9, "weight": 0.8, "reason": "答案正确，正方形时面积最大为144"},
    {"criterion": "公式使用", "single_score": 0.8, "weight": 0.15, "reason": "使用了周长公式但未明确写出"},
    {"criterion": "逻辑完整性", "single_score": 0.7, "weight": 0.05, "reason": "推理步骤较简略"}
  ],
  "total_reason": "回答正确且核心推理完整，但公式展示和推理步骤可更详细"
}

Hyperparameter

Value

Method

LoRA + OPSD (Online Policy Self-Distillation)

LoRA Rank

LoRA Alpha

128

Learning Rate

1e-5

Epochs

Batch Size

1 × 8 (grad accum) × 8 GPUs = effective 64

Max Sequence Length

4096

Max Completion Length

2048

Temperature

1.0

Qwen3.6-27B-JudgeOPSD-0604

README

Overview

Usage

With Transformers + PEFT

With vLLM (Recommended for Production)

Prompt Format

Training Details

Training Data

Limitations

Citation

License

Explore FriendliAI today

README

Overview

Usage

With Transformers + PEFT

With vLLM (Recommended for Production)

Prompt Format

Training Details

Training Data

Limitations

Citation

License