Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherModel Summary
| Field | Value |
|---|---|
| Architecture | Qwen3.6 35B-A3B MoE text-generation model |
| Format | Merged safetensors full model |
| Precision | BF16/FP16 weights |
| Size | ~70.24 GB decimal, ~65.41 GiB |
| Shards | 21 safetensors shards |
| Primary focus | Python/coding reasoning + cybersecurity instruction response |
Training
Nyx-35B was trained with a two-stage sequential LoRA workflow:
-
Stage 1: CodeX pilot
- Dataset:
Modotte/CodeX-2M-Thinking - Rows: 20,000
- Goal: improve coding and Python reasoning behavior
- Dataset:
-
Stage 2: Cyber specialization
- Dataset:
jmtss/cyber-security-instruct-3k - Rows: 3,678
- Effective batch size: 32
- Steps: 115
- Learning rate:
5e-5 - Final train loss:
1.511
- Dataset:
The final uploaded model is a merged model:
text
base model + Stage 1 CodeX adapter + Stage 2 Cyber adapter
Recommended Hardware
For full-precision inference, the model needs more than the raw 70 GB weight size because serving also requires runtime memory and KV cache.
| Hardware | Recommendation |
|---|---|
| NVIDIA H200 141GB | Recommended single-GPU deployment |
| NVIDIA B200 / B300 | Best high-end option with more headroom |
| RTX PRO 6000 Blackwell 96GB | Workstation/single-user option |
| H100 80GB | Tight; use small context/batch or quantization |
| Consumer 24GB/32GB GPUs | Use quantized variants only |
Quick Start
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "jmtss/Nyx-35B"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto",trust_remote_code=True,)messages = [{"role": "user", "content": "Write a short Python function that checks if a URL uses HTTPS."}]prompt = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,)inputs = tokenizer(prompt, return_tensors="pt").to(model.device)outputs = model.generate(**inputs,max_new_tokens=256,temperature=0.2,do_sample=True,)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
vLLM
For production serving, use vLLM if your environment supports this Qwen3.6 MoE architecture:
bash
vllm serve jmtss/Nyx-35B \--trust-remote-code \--dtype bfloat16 \--max-model-len 4096
Increase --max-model-len only if your GPU has enough free memory for KV cache.
Intended Use
Nyx-35B is intended for:
- Python and software engineering assistance
- Defensive cybersecurity education
- MITRE ATT&CK-style concept explanation
- Security documentation and analysis support
- General technical instruction following
Safety and Limitations
- This model has not been formally benchmarked beyond training loss and basic sanity prompts.
- Cybersecurity outputs should be reviewed by a qualified human before operational use.
- The model may produce incorrect, outdated, or incomplete security guidance.
- The cybersecurity tuning is intended for defensive, educational, and authorized research contexts.
- Do not use this model for unauthorized access, credential theft, malware deployment, evasion, or other harmful activity.
- The model may include thinking-style prefaces in responses because of the base and training data style.
Training Artifacts
The uploaded repository contains the merged full model only. Intermediate LoRA adapters and training checkpoints were not included in this repository.
License
This model is a derivative of the listed base model and datasets. Use is subject to the terms of the base model, datasets, and any applicable licenses. Verify compatibility for your use case before commercial or production deployment.
Model provider
jmtss
Model tree
Base
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information