SKT-NRS

NRS_QWEN_MYTHOS_1M

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

🛠️ How to Use & Run

This model is optimized for both Local Execution and Cloud Notebooks (Colab).

1. 🐍 Python (Transformers)

Install the required libraries:

bash
pip install transformers torch accelerate

Basic Inference Code:

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "SKT-NRS/NRS_QWEN_MYTHOS_1M" 

# Load Model and Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prepare Prompt
messages = [
    {"role": "system", "content": "You are NRS, an advanced reasoning assistant."},
    {"role": "user", "content": "Explain quantum entanglement simply."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    do_sample=True
)

response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(response)

2. ☁️ Google Colab Ready

Run this on a free T4 GPU or paid A100/V100 instances.

python
!pip install transformers accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "SKT-NRS/NRS_QWEN_MYTHOS_1M"

# Load in 4-bit for memory efficiency
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

print("Model Loaded Successfully! Ready for Reasoning.")

3. 🖥️ Local Running (Ollama / LM Studio)

For the best local experience, use GGUF quantizations.

Using Ollama:

bash
ollama create nrs-mythos -f Modelfile
ollama run nrs-mythos

Using LM Studio:

Download the .gguf file from the "Files" section.
Drag and drop into LM Studio.
Set Context Length to 8192 or higher.

4. ⚡ High-Performance Serving (vLLM)

For production-grade speed and 1M context support:

bash
pip install vllm

vllm serve SKT-NRS/NRS_QWEN_MYTHOS_1M \
    --max-model-len 1000000 \
    --gpu-memory-utilization 0.9 \
    --dtype bfloat16

🧠 Technical Details & Training

Base Model

Architecture: Qwen 3.5 9B
Context Window: 1,048,576 tokens (via YaRN RoPE Scaling)

NRS Enhancement Process

The model underwent a rigorous Neural Reasoning System (NRS) enhancement pipeline:

Reasoning Boosting Tool: Proprietary NRS tools generated high-quality Chain-of-Thought (CoT) data.
Supervised Fine-Tuning (SFT): Tuned on ~500k high-quality reasoning samples (coding, math, logic).
Tool Calling Optimization: Enhanced native function calling for Python & Web Search.

Sampling Parameters

Temperature: 0.6
Top_P: 0.95
Top_K: 20
Repetition Penalty: 1.05

⚠️ Limitations & Disclaimer

Reasoning Mode: The model outputs <think> blocks. Parse them if needed.
Uncensored Nature: Designed for open research. Use responsibly.
Hallucinations: Always verify critical facts with external sources.

LICENSE AND TERMS

Model provider

SKT-NRS

Model tree

Base

Qwen/Qwen3.5-9B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

🛠️ How to Use & Run

This model is optimized for both Local Execution and Cloud Notebooks (Colab).

1. 🐍 Python (Transformers)

Install the required libraries:

bash
pip install transformers torch accelerate

Basic Inference Code:

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "SKT-NRS/NRS_QWEN_MYTHOS_1M" 

# Load Model and Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prepare Prompt
messages = [
    {"role": "system", "content": "You are NRS, an advanced reasoning assistant."},
    {"role": "user", "content": "Explain quantum entanglement simply."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    do_sample=True
)

response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(response)

2. ☁️ Google Colab Ready

Run this on a free T4 GPU or paid A100/V100 instances.

python
!pip install transformers accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "SKT-NRS/NRS_QWEN_MYTHOS_1M"

# Load in 4-bit for memory efficiency
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

print("Model Loaded Successfully! Ready for Reasoning.")

3. 🖥️ Local Running (Ollama / LM Studio)

For the best local experience, use GGUF quantizations.

Using Ollama:

bash
ollama create nrs-mythos -f Modelfile
ollama run nrs-mythos

Using LM Studio:

Download the .gguf file from the "Files" section.
Drag and drop into LM Studio.
Set Context Length to 8192 or higher.

4. ⚡ High-Performance Serving (vLLM)

For production-grade speed and 1M context support:

bash
pip install vllm

vllm serve SKT-NRS/NRS_QWEN_MYTHOS_1M \
    --max-model-len 1000000 \
    --gpu-memory-utilization 0.9 \
    --dtype bfloat16

🧠 Technical Details & Training

Base Model

Architecture: Qwen 3.5 9B
Context Window: 1,048,576 tokens (via YaRN RoPE Scaling)

NRS Enhancement Process

The model underwent a rigorous Neural Reasoning System (NRS) enhancement pipeline:

Reasoning Boosting Tool: Proprietary NRS tools generated high-quality Chain-of-Thought (CoT) data.
Supervised Fine-Tuning (SFT): Tuned on ~500k high-quality reasoning samples (coding, math, logic).
Tool Calling Optimization: Enhanced native function calling for Python & Web Search.

NRS_QWEN_MYTHOS_1M

Get help setting up a custom Dedicated Endpoints.

README

🛠️ How to Use & Run

1. 🐍 Python (Transformers)

2. ☁️ Google Colab Ready

3. 🖥️ Local Running (Ollama / LM Studio)

4. ⚡ High-Performance Serving (vLLM)

🧠 Technical Details & Training

Base Model

NRS Enhancement Process

Sampling Parameters

⚠️ Limitations & Disclaimer

LICENSE AND TERMS

Explore FriendliAI today

README

🛠️ How to Use & Run

1. 🐍 Python (Transformers)

2. ☁️ Google Colab Ready

3. 🖥️ Local Running (Ollama / LM Studio)

4. ⚡ High-Performance Serving (vLLM)

🧠 Technical Details & Training

Base Model

NRS Enhancement Process

Sampling Parameters

⚠️ Limitations & Disclaimer

LICENSE AND TERMS