🛠️ How to Use & Run
This model is optimized for both Local Execution and Cloud Notebooks (Colab).
Install the required libraries:
pip install transformers torch accelerate
Basic Inference Code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "SKT-NRS/NRS_QWEN_MYTHOS_1M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are NRS, an advanced reasoning assistant."},
{"role": "user", "content": "Explain quantum entanglement simply."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.6,
top_p=0.95,
do_sample=True
)
response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(response)
2. ☁️ Google Colab Ready
Run this on a free T4 GPU or paid A100/V100 instances.
!pip install transformers accelerate bitsandbytes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "SKT-NRS/NRS_QWEN_MYTHOS_1M"
model = AutoModelForCausalLM.from_pretrained(
model_id,
load_in_4bit=True,
device_map="auto",
torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
print("Model Loaded Successfully! Ready for Reasoning.")
3. 🖥️ Local Running (Ollama / LM Studio)
For the best local experience, use GGUF quantizations.
Using Ollama:
ollama create nrs-mythos -f Modelfile
ollama run nrs-mythos
Using LM Studio:
- Download the
.gguf file from the "Files" section.
- Drag and drop into LM Studio.
- Set Context Length to
8192 or higher.
For production-grade speed and 1M context support:
pip install vllm
vllm serve SKT-NRS/NRS_QWEN_MYTHOS_1M \
--max-model-len 1000000 \
--gpu-memory-utilization 0.9 \
--dtype bfloat16
🧠 Technical Details & Training
Base Model
- Architecture: Qwen 3.5 9B
- Context Window: 1,048,576 tokens (via YaRN RoPE Scaling)
NRS Enhancement Process
The model underwent a rigorous Neural Reasoning System (NRS) enhancement pipeline:
- Reasoning Boosting Tool: Proprietary NRS tools generated high-quality Chain-of-Thought (CoT) data.
- Supervised Fine-Tuning (SFT): Tuned on ~500k high-quality reasoning samples (coding, math, logic).
- Tool Calling Optimization: Enhanced native function calling for Python & Web Search.
Sampling Parameters
- Temperature:
0.6
- Top_P:
0.95
- Top_K:
20
- Repetition Penalty:
1.05
⚠️ Limitations & Disclaimer
- Reasoning Mode: The model outputs
<think> blocks. Parse them if needed.
- Uncensored Nature: Designed for open research. Use responsibly.
- Hallucinations: Always verify critical facts with external sources.
LICENSE AND TERMS