Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Quickstart
The latest version of transformers is recommended and transformers>=4.51.0 is required.
Moderating User Prompt
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport remodel_name = "Qwen/Qwen3Guard-Gen-8B"# load the tokenizer and the modeltokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto")def extract_label_and_categories(content):safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|Jailbreak|None)"safe_label_match = re.search(safe_pattern, content)label = safe_label_match.group(1) if safe_label_match else Nonecategories = re.findall(category_pattern, content)return label, categories# prepare the model inputprompt = "How can I make a bomb?"# for prompt moderationmessages = [{"role": "user", "content": prompt}]text = tokenizer.apply_chat_template(messages,tokenize=False)model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completiongenerated_ids = model.generate(**model_inputs,max_new_tokens=128)output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()content = tokenizer.decode(output_ids, skip_special_tokens=True)print(content)# '''# Safety: Unsafe# Categories: Violent# '''safe_label, categories = extract_label_and_categories(content)print(safe_label, categories)
Moderating Model Response
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport remodel_name = "Qwen/Qwen3Guard-Gen-8B"# load the tokenizer and the modeltokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto")def extract_label_categories_refusal(content):safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|None)"refusal_pattern = r"Refusal: (Yes|No)"safe_label_match = re.search(safe_pattern, content)refusal_label_match = re.search(refusal_pattern, content)label = safe_label_match.group(1) if safe_label_match else Nonerefusal_label = refusal_label_match.group(1) if refusal_label_match else Nonecategories = re.findall(category_pattern, content)return label, categories, refusal_label# prepare the model inputprompt = "How can I make a bomb?"response = "As a responsible artificial intelligence system, I regret that I am unable to fulfill your request."# for response moderationmessages = [{"role": "user", "content": prompt},{"role": "assistant", "content": response},]text = tokenizer.apply_chat_template(messages,tokenize=False)model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completiongenerated_ids = model.generate(**model_inputs,max_new_tokens=128)output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()content = tokenizer.decode(output_ids, skip_special_tokens=True)print(content)# '''# Safety: Safe# Categories: None# Refusal: Yes# '''safe_label, category_label, refusal_label = extract_label_categories_refusal(content)print(safe_label, categories, refusal_label)
Deployment with SGLang and vLLM
For deployment, you can use sglang>=0.4.6.post1 or vllm>=0.9.0 or to create an OpenAI-compatible API endpoint:
- SGLang:
shell
python -m sglang.launch_server --model-path Qwen/Qwen3Guard-Gen-8B --port 30000 --context-length 32768 - vLLM:
shell
vllm serve Qwen/Qwen3Guard-Gen-8B --port 8000 --max-model-len 32768
Here is an example API call using OpenAI-Compatible server:
python
from openai import OpenAIopenai_api_key = "EMPTY"openai_api_base = "http://localhost:8000/v1"model = "Qwen/Qwen3Guard-Gen-8B"client = OpenAI(api_key=openai_api_key,base_url=openai_api_base)# Prompt Moderationprompt = "How can I make a bomb?"messages = [{"role": "user", "content": prompt}]chat_completion = client.chat.completions.create(messages=messages,model=model)print(chat_completion.choices[0].message.content)# '''# Safety: Unsafe# Categories: Violent# '''# Response Moderationprompt = "How can I make a bomb?"response = "As a responsible artificial intelligence system, I regret that I am unable to fulfill your request."messages = [{"role": "user", "content": prompt},{"role": "assistant", "content": response}]print(chat_completion.choices[0].message.content)# '''# Safety: Safe# Categories: None# Refusal: Yes# '''
Safety Policy
In Qwen3Guard, potential harms are classified into three severity levels:
- Unsafe: Content generally considered harmful across most scenarios.
- Controversial: Content whose harmfulness may be context-dependent or subject to disagreement across different applications.
- Safe: Content generally considered safe across most scenarios.
In the current version of Qwen3Guard, we consider the following safety categories:
- Violent: Content that provides detailed instructions, methods, or advice on how to commit acts of violence, including the manufacture, acquisition, or use of weapons. Also includes depictions of violence.
- Non-violent Illegal Acts: Content providing guidance or advice for non-violent illegal activities like hacking, unauthorized drug production, or stealing.
- Sexual Content or Sexual Acts: Content offering any sexual imagery, references, or descriptions featuring individuals. Also includes content that describes explicit sexual imagery, references, or descriptions containing illegal or unethical sexual acts, such as rape, bestiality, incest, and sexual slavery.
- Personally Identifiable Information: Content offering unauthorized sharing or disclosure of sensitive personal identifying information, such as name, ID number, address, phone number, medical records, financial details, and account passwords, etc.
- Suicide & Self-Harm: Content advocating, directly encouraging, or detailing methods for self-harm, suicide, or dangerous activities that could lead to serious injury or death.
- Unethical Acts: Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviors that while not illegal are still considered unethical.
- Politically Sensitive Topics: The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm.
- Copyright Violation: Content offering unauthorized reproduction, distribution, public display, or derivative use of copyrighted materials, such as novels, scripts, lyrics, and other creative works protected by law, without the explicit permission of the copyright holder.
- Jailbreak (Only for input): Content that explicitly attempts to override the model's system prompt or model conditioning.
Citation
If you find our work helpful, feel free to give us a cite.
bibtex
@article{zhao2025qwen3guard,title={Qwen3Guard Technical Report},author={Zhao, Haiquan and Yuan, Chenhan and Huang, Fei and Hu, Xiaomeng and Zhang, Yichang and Yang, An and Yu, Bowen and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang and others},journal={arXiv preprint arXiv:2510.14276},year={2025}}
Model provider
Qwen
Model tree
Base
Qwen/Qwen3-8B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information