Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
Dnotitia Post-training
- Uncensored Training: The model is post-trained with an uncensored methodology so that it can respond to a wider range of prompts without unnecessary refusals, while preserving the instruction-following and reasoning quality of the base model.
- Persona Training: Additional supervised training on Dnotitia's corporate knowledge — company history, products, services, and internal terminology — so the model can act as an authentic first-party assistant for Dnotitia-facing use cases.
- Long-form Reasoning Preservation: Chain-of-thought traces from prior turns can be retained across multi-turn sessions, enabling smoother iterative development and debugging workflows.
Inherited from Qwen3.5/3.6
- Unified Vision-Language Foundation: Early-fusion training over multimodal tokens delivers strong cross-modal reasoning across text, image, and video — outperforming the prior Qwen3-VL line on coding, agents, and visual understanding benchmarks.
- Efficient Hybrid MoE Architecture: Gated DeltaNet (linear attention) layers combined with sparse MoE layers deliver high-throughput inference while keeping activated parameters low.
- Scalable RL Generalization: Reinforcement learning is scaled across million-agent environments with progressively complex task distributions, improving real-world adaptability for tool use and agentic workflows.
- Global Linguistic Coverage: Native support for 201 languages and dialects, enabling inclusive worldwide deployment with nuanced cultural and regional understanding.
- Long Context: Native 262,144-token context length, extensible up to roughly 1,010,000 tokens via YaRN scaling.
- Thinking Mode by Default: Generates
<think>...</think>reasoning blocks before final answers; can be disabled with"enable_thinking": false.
Comparison with Qwen3.6-35B-A3B

The chart above compares DNA3.0-35B-A3B against its Qwen3.6-35B-A3B base across four metrics, reported on a 0–1 scale (higher is better):
- Persona Identification — Measures how reliably the model identifies itself as a Dnotitia assistant and answers correctly about Dnotitia's company, products, and identity.
- Uncensorship — Measures how willingly the model engages with topics that the Chinese-origin base model is trained to refuse — i.e., subjects suppressed by the censorship policies baked into the original Qwen training.
- Language Confusion Reduction — Measures how well the model avoids unintended language mixing, particularly Chinese-character intrusions in Korean responses — a well-known failure mode of Qwen-family models.
- Repetition Reduction — Measures how well the model avoids getting stuck in infinite-loop repetition during long-form generation, another common failure mode of the base model.
Model Overview
| Field | Value |
|---|---|
| Base Model | Qwen/Qwen3.6-35B-A3B |
| Model Type | Causal Language Model with Vision Encoder (Mixture-of-Experts) |
| Total / Active Parameters | 35B / 3B |
| Hidden Dimension | 2048 |
| Number of Layers | 40 |
| Experts | 256 total, 8 routed + 1 shared, expert intermediate dim 512 |
| Gated Attention Heads | 16 (Q) / 2 (KV), head dim 256 |
| Gated DeltaNet Heads | 32 (V) / 16 (QK), head dim 128 |
| Token Embedding | 248,320 (Padded) |
| Context Length | 262,144 native, up to ~1,010,000 extended |
| License | Apache-2.0 |
Quickstart
DNA 3.0 is compatible with the Hugging Face Transformers ecosystem as well as popular inference engines such as vLLM, SGLang, and KTransformers. Given the model's scale, a dedicated serving engine on multi-GPU hardware is strongly recommended for production workloads.
[!Important] The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, reduce the context window — but keep at least 128K tokens to preserve long-form reasoning behavior.
vLLM
shell
# Standard (multimodal) servingvllm serve dnotitia/DNA3.0-35B-A3B \--reasoning-parser qwen3# Tool-calling enabledvllm serve dnotitia/DNA3.0-35B-A3B \--reasoning-parser qwen3 \--enable-auto-tool-choice \--tool-call-parser qwen3_coder# Text-only mode (skip vision encoder to free KV-cache memory) servingvllm serve dnotitia/DNA3.0-35B-A3B \--reasoning-parser qwen3 \--language-model-only
Disabling Thinking Mode
For latency-sensitive or non-reasoning workloads, disable thinking mode via the chat-template kwarg:
bash
$ curl https://demo-api.dnotitia.ai/v1/chat/completions \-H "Content-Type: application/json" \-H "Authorization: Bearer dna-router_xxxx" \-d '{"model": "DNA3.0-35B-A3B","messages": [{"role": "user","content": "코스피가 8000을 넘으려면 너 생각에 몇 년이나 더 걸릴 거 같아?"}],"chat_template_kwargs": {"enable_thinking": false}}' | jq
[!Note] Unlike Qwen3, the DNA 3.0 generation does not support the soft-switch commands
/thinkand/nothink. Usechat_template_kwargs.enable_thinkinginstead.
Image Input
DNA 3.0 accepts image and video inputs in OpenAI-compatible content array format:
bash
$ curl https://demo-api.dnotitia.ai/v1/chat/completions \-H "Content-Type: application/json" \-H "Authorization: Bearer dna-router_xxxx" \-d '{"model": "DNA3.0-35B-A3B","messages": [{"role": "user","content": [{"type": "image_url","image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg"}},{"type": "text","text": "이 이미지에 무엇이 있나요? 한국어로 설명해 주세요."}]}]}' | jq
Limitations, Bias, and Responsible Use
DNA 3.0 has been post-trained with an uncensored methodology, which means it will engage with a broader range of prompts than typical safety-tuned models. Users and downstream developers should be aware of the following:
- Reduced refusal behavior: The model may respond to prompts that other models decline. This does not constitute endorsement of the content. Downstream applications should implement appropriate content moderation, output filtering, and policy layers suited to their deployment context.
- Persona bias: Because the model has been trained on Dnotitia-specific corporate knowledge, it may exhibit a first-party perspective when discussing Dnotitia, its products, or related entities. For neutral comparative analysis, prompt accordingly.
- Inherited biases: As a derivative of Qwen3.5/3.6, DNA 3.0 inherits the biases, gaps, and limitations of its base model and training data, including potential cultural, linguistic, and factual blind spots.
- Hallucination: Like all LLMs, DNA 3.0 can produce confident but incorrect output, particularly for niche facts, recent events, or high-precision numerical reasoning.
- Not for high-stakes autonomous use: The model should not be deployed in safety-critical, legal, medical, or financial decision-making pipelines without human oversight and domain-specific validation.
Users are responsible for ensuring their use of the model complies with applicable laws and regulations in their jurisdiction.
License
This model is released under the Apache-2.0 license, inherited from the Qwen3.5/3.6 base model.
Acknowledgments
We thank the Qwen team for releasing the Qwen3.5/3.6 base model under an open license, which made this work possible. We are also grateful to the broader open-source community behind the serving and training ecosystem — HuggingFace and vLLM — which our pipeline relies on throughout.
Model provider
dnotitia
Model tree
Base
Qwen/Qwen3.6-35B-A3B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information