neuroturk

HYZ-01-0.6B

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

1. Introduction

HYZ-01-Instruct is the instruction-tuned version of the HYZ-01 series developed by NeuroTürk. Building on the base model's strong Turkish language understanding, supervised fine-tuning (SFT) on high-quality instruction-response pairs has improved instruction-following performance across tasks such as conversation, question answering, summarization, and code generation.

The model is built on a multilingual foundation covering 119 languages, followed by Turkish-focused continual pre-training (CPT) and fine-tuning on 372,697 instruction-response pairs. The tokenizer has been extended specifically for Turkish morphological structure and advanced use cases. HYZ-01-0.6B is the lightweight, open-source version of HYZ-01, developed by NeuroTürk for Turkish.

Note: This is the instruction fine-tuned version. For the base model, see: HYZ-01-0.6B-Base

2. Model Summary

Continual Pre-Training and Fine-Tuning

Base model: 4-stage Turkish continual pre-training (CPT) applied on top of a multilingual foundation.
Fine-tuning (SFT): 372,697 carefully curated Turkish instruction-response pairs.
Optimization: LoRA (r=64) + DoRA, bfloat16, flash-attention-2, AdamW.
Final training loss: 0.6707

Tokenizer Extension

New special tokens were added to the tokenizer for two purposes:

Language-structure tokens: To represent Turkish morphological features more efficiently.
Task and structure tokens: To support structural use cases such as chain-of-thought, code blocks, section markers, and language labels.

The following 20 tokens have been added to the vocabulary but were not used during training; they are defined as infrastructure for future advanced capabilities:

Table with columns: Group, Tokens, Future Use
Group	Tokens	Future Use
Brand	`<\|neuroturk\|>` `<\|hyz01\|>` `<\|tr\|>` `<\|en\|>`	Model identity and multilingual control
Chain-of-Thought	`<\|think\|>` `<\|/think\|>` `<\|step\|>` `<\|answer\|>`	Step-by-step reasoning (CoT)
Dialogue

Note: <|system|> <|user|> <|assistant|> tokens are actively used in the chat template.

3. Model Details

Table with columns: Feature, Value
Feature	Value
Total parameters	595,798,016 (~0.6B)
Non-embedding parameters	440,467,456 (~0.44B)
Hidden dimension	1,024
Number of layers	28
Attention heads (Q)	16
Attention heads (KV)	8 (GQA)
Head dimension	128
Activation	SiLU
Normalization	RMSNorm (ε = 1 × 10⁻⁶)

4. Training Details

Table with columns: Setting, Value
Setting	Value
Base model training	Multi-stage Turkish CPT
Fine-tuning type	Supervised Fine-Tuning (SFT)
Fine-tuning data size	372,697 instruction-response pairs
Optimization	LoRA (r=64) + DoRA, AdamW
Precision	BFloat16
Final loss	0.6707
LR schedule	Cosine with warmup
Context length	4,096 tokens

5. Usage

Installation

bash
pip install transformers torch accelerate

Quick Start (Chat Format)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuroturk/HYZ-01-0.6B"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    fix_mistral_regex=True 
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Senin adın HYZ-01, NeuroTürk tarafından geliştirilmiş bir Türkçe asistansın."},
    {"role": "user", "content": "Yapay zeka nedir?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.1,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Low VRAM (4-bit Quantization)

python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(
    "neuroturk/HYZ-01-0.6B",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "neuroturk/HYZ-01-0.6B",
    quantization_config=bnb_config,
    device_map="auto",
)

Additional Fine-Tuning with Unsloth

python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="neuroturk/HYZ-01-0.6B",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=32,
    lora_alpha=64,
    lora_dropout=0.0,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    use_gradient_checkpointing="unsloth",
)

GGUF Quantizations

For faster inference and lower resource usage, GGUF quantized versions of HYZ-01-0.6B are available. These were kindly provided by mradermacher.

You can find them here: HYZ-01-0.6B-GGUF

Using with llama.cpp

Download the GGUF file (e.g., hyz-01-0.6b-q4_k_m.gguf) from the repository above.

Run with llama.cpp

bash
./main -m hyz-01-0.6b-q4_k_m.gguf -p "Your prompt here" -n 512

For a detailed explanation of quantization types (e.g., Q4_K_M, Q5_K_M), see the llama.cpp documentation.

Note: These GGUF files are not officially maintained by NeuroTürk, but they are community-tested and widely used. Thanks again to mradermacher for the contribution.

6. Chat Template

jinja2
{% for message in messages %}
{% if message['role'] == 'system' %}
<|system|>
{{ message['content'] }}<|endoftext|>
{% elif message['role'] == 'user' %}
<|user|>
{{ message['content'] }}<|endoftext|>
{% elif message['role'] == 'assistant' %}
<|assistant|>
{{ message['content'] }}<|endoftext|>
{% endif %}
{% endfor %}
{% if add_generation_prompt %}<|assistant|>
{% endif %}

7. Evaluation Results

All evaluations were conducted using lm-evaluation-harness.

Table with columns: Task, Category, Setting, Score
Task	Category	Setting	Score
TurBLiMP (ditransitive)	Grammar	0-shot	89.10%
TurBLiMP (transitive)	Grammar	0-shot	86.40%
XCOPA TR	Causality	0-shot	56.80%
XNLI TR	Natural language inference	0-shot

Note: XQuAD TR was evaluated in generative question-answering format. The Exact Match (EM) score appears low due to strict string matching requirements; the F1 score better reflects the model's actual performance.

Note: TokSuite TR and MGSM TR evaluations are ongoing; results will be added upon completion.

The model may perform somewhat better than benchmark scores indicate on tasks such as everyday conversation, text summarization, code generation, and open-ended question answering.

8. Limitations

Although the model is successful at instruction following, it may occasionally produce incorrect or inconsistent outputs.
Complex multi-step reasoning may be limited with 0.6B parameters.
Biases present in the training data may be reflected in outputs.
Performance drops significantly in languages other than Turkish.
Human verification of outputs is recommended for critical applications.

9. Citation

bibtex
@misc{neuroturk2026hyz01,
  author       = {NeuroTürk},
  title        = {HYZ-01-0.6B: A Lightweight Turkish Instruction Model},
  year         = 2026,

}

Model provider

neuroturk

Model tree

Base

neuroturk/HYZ-01-0.6B-Base

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

1. Introduction

Note: This is the instruction fine-tuned version. For the base model, see: HYZ-01-0.6B-Base

2. Model Summary

Continual Pre-Training and Fine-Tuning

Base model: 4-stage Turkish continual pre-training (CPT) applied on top of a multilingual foundation.
Fine-tuning (SFT): 372,697 carefully curated Turkish instruction-response pairs.
Optimization: LoRA (r=64) + DoRA, bfloat16, flash-attention-2, AdamW.
Final training loss: 0.6707

Tokenizer Extension

New special tokens were added to the tokenizer for two purposes:

Language-structure tokens: To represent Turkish morphological features more efficiently.
Task and structure tokens: To support structural use cases such as chain-of-thought, code blocks, section markers, and language labels.

The following 20 tokens have been added to the vocabulary but were not used during training; they are defined as infrastructure for future advanced capabilities:

Table with columns: Group, Tokens, Future Use
Group	Tokens	Future Use
Brand	`<\|neuroturk\|>` `<\|hyz01\|>` `<\|tr\|>` `<\|en\|>`	Model identity and multilingual control
Chain-of-Thought	`<\|think\|>` `<\|/think\|>` `<\|step\|>` `<\|answer\|>`	Step-by-step reasoning (CoT)
Dialogue

Note: <|system|> <|user|> <|assistant|> tokens are actively used in the chat template.

3. Model Details

Table with columns: Feature, Value
Feature	Value
Total parameters	595,798,016 (~0.6B)
Non-embedding parameters	440,467,456 (~0.44B)
Hidden dimension	1,024
Number of layers	28
Attention heads (Q)	16
Attention heads (KV)	8 (GQA)
Head dimension	128
Activation	SiLU
Normalization	RMSNorm (ε = 1 × 10⁻⁶)

4. Training Details

Table with columns: Setting, Value
Setting	Value
Base model training	Multi-stage Turkish CPT
Fine-tuning type	Supervised Fine-Tuning (SFT)
Fine-tuning data size	372,697 instruction-response pairs
Optimization	LoRA (r=64) + DoRA, AdamW
Precision	BFloat16
Final loss	0.6707
LR schedule	Cosine with warmup
Context length	4,096 tokens

5. Usage

Installation

bash
pip install transformers torch accelerate

Quick Start (Chat Format)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuroturk/HYZ-01-0.6B"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    fix_mistral_regex=True 
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Senin adın HYZ-01, NeuroTürk tarafından geliştirilmiş bir Türkçe asistansın."},
    {"role": "user", "content": "Yapay zeka nedir?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.1,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Low VRAM (4-bit Quantization)

python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(
    "neuroturk/HYZ-01-0.6B",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "neuroturk/HYZ-01-0.6B",
    quantization_config=bnb_config,
    device_map="auto",
)

Additional Fine-Tuning with Unsloth

python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="neuroturk/HYZ-01-0.6B",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=32,
    lora_alpha=64,
    lora_dropout=0.0,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    use_gradient_checkpointing="unsloth",
)

GGUF Quantizations

For faster inference and lower resource usage, GGUF quantized versions of HYZ-01-0.6B are available. These were kindly provided by mradermacher.

You can find them here: HYZ-01-0.6B-GGUF

Using with llama.cpp

Download the GGUF file (e.g., hyz-01-0.6b-q4_k_m.gguf) from the repository above.

Run with llama.cpp

bash
./main -m hyz-01-0.6b-q4_k_m.gguf -p "Your prompt here" -n 512

For a detailed explanation of quantization types (e.g., Q4_K_M, Q5_K_M), see the llama.cpp documentation.

Note: These GGUF files are not officially maintained by NeuroTürk, but they are community-tested and widely used. Thanks again to mradermacher for the contribution.

6. Chat Template

jinja2
{% for message in messages %}
{% if message['role'] == 'system' %}
<|system|>
{{ message['content'] }}<|endoftext|>
{% elif message['role'] == 'user' %}
<|user|>
{{ message['content'] }}<|endoftext|>
{% elif message['role'] == 'assistant' %}
<|assistant|>
{{ message['content'] }}<|endoftext|>
{% endif %}
{% endfor %}
{% if add_generation_prompt %}<|assistant|>
{% endif %}

7. Evaluation Results

All evaluations were conducted using lm-evaluation-harness.

Table with columns: Task, Category, Setting, Score
Task	Category	Setting	Score
TurBLiMP (ditransitive)	Grammar	0-shot	89.10%
TurBLiMP (transitive)	Grammar	0-shot	86.40%
XCOPA TR	Causality	0-shot	56.80%
XNLI TR	Natural language inference	0-shot

Note: XQuAD TR was evaluated in generative question-answering format. The Exact Match (EM) score appears low due to strict string matching requirements; the F1 score better reflects the model's actual performance.

Note: TokSuite TR and MGSM TR evaluations are ongoing; results will be added upon completion.

The model may perform somewhat better than benchmark scores indicate on tasks such as everyday conversation, text summarization, code generation, and open-ended question answering.

8. Limitations

Although the model is successful at instruction following, it may occasionally produce incorrect or inconsistent outputs.
Complex multi-step reasoning may be limited with 0.6B parameters.
Biases present in the training data may be reflected in outputs.
Performance drops significantly in languages other than Turkish.
Human verification of outputs is recommended for critical applications.

9. Citation

bibtex
@misc{neuroturk2026hyz01,
  author       = {NeuroTürk},
  title        = {HYZ-01-0.6B: A Lightweight Turkish Instruction Model},
  year         = 2026,

}

HYZ-01-0.6B

Get help setting up a custom Dedicated Endpoints.

README

1. Introduction

2. Model Summary

Continual Pre-Training and Fine-Tuning

Tokenizer Extension

3. Model Details

4. Training Details

5. Usage

Installation

Quick Start (Chat Format)

Low VRAM (4-bit Quantization)

Additional Fine-Tuning with Unsloth

GGUF Quantizations

6. Chat Template

7. Evaluation Results

8. Limitations

9. Citation

Explore FriendliAI today

README

1. Introduction

2. Model Summary

Continual Pre-Training and Fine-Tuning

Tokenizer Extension

3. Model Details

4. Training Details

5. Usage

Installation

Quick Start (Chat Format)

Low VRAM (4-bit Quantization)

Additional Fine-Tuning with Unsloth

GGUF Quantizations

6. Chat Template

7. Evaluation Results

8. Limitations

9. Citation