neuroturk

HYZ-01-0.6B-Base

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

1. Introduction

HYZ-01-0.6B-Base is the base (pre-trained only) version of the HYZ-01 series developed by NeuroTürk. It is a raw language model that has undergone multi-stage Turkish continual pre-training (CPT) on top of a multilingual foundation, without any instruction tuning or alignment. It is intended for researchers and developers who wish to fine-tune the model for their own tasks.

The model is built on a multilingual foundation covering 119 languages and has been continuously pre-trained with a focus on Turkish. The tokenizer has been extended specifically for Turkish morphological structure and advanced use cases. HYZ-01-0.6B-Base is the lightweight, open-source base version of HYZ-01, developed by NeuroTürk for Turkish.

Note: This is the base pre-trained version. For the instruction-tuned version, see: HYZ-01-0.6B

2. Model Summary

Continual Pre-Training

Base model: 4-stage Turkish continual pre-training (CPT) applied on top of a multilingual foundation.
Training stages include general Turkish web corpus, curated domain data, Wikipedia, and high-quality filtered text.
Optimization: bfloat16, flash-attention-2, AdamW.

Tokenizer Extension

New special tokens were added to the tokenizer for two purposes:

Language-structure tokens: To represent Turkish morphological features more efficiently.
Task and structure tokens: To support structural use cases such as chain-of-thought, code blocks, section markers, and language labels.

The following 20 tokens have been added to the vocabulary and are reserved as infrastructure for future advanced capabilities:

Table with columns: Group, Tokens, Future Use
Group	Tokens	Future Use
Brand	`<\|neuroturk\|>` `<\|hyz01\|>` `<\|tr\|>` `<\|en\|>`	Model identity and multilingual control
Chain-of-Thought	`<\|think\|>` `<\|/think\|>` `<\|step\|>` `<\|answer\|>`	Step-by-step reasoning (CoT)
Dialogue

3. Model Details

Table with columns: Feature, Value
Feature	Value
Total parameters	595,798,016 (~0.6B)
Non-embedding parameters	440,467,456 (~0.44B)
Hidden dimension	1,024
Number of layers	28
Attention heads (Q)	16
Attention heads (KV)	8 (GQA)
Head dimension	128
Activation	SiLU
Normalization	RMSNorm (ε = 1 × 10⁻⁶)

4. Training Details

Table with columns: Setting, Value
Setting	Value
Training type	Continual Pre-Training (CPT)
Number of stages	4
Optimization	AdamW
Precision	BFloat16
LR schedule	Cosine with warmup
Context length	4,096 tokens

5. Usage

Warning: This is a base model. It is not instruction-tuned and will not follow instructions reliably. For conversational or task-oriented use, use the instruction-tuned version: HYZ-01-0.6B

Installation

bash
pip install transformers torch accelerate

Text Generation (Completion)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuroturk/HYZ-01-0.6B-Base"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    fix_mistral_regex=True 
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Yapay zeka, bilgisayar sistemlerinin"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.1,
)


new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

Low VRAM (4-bit Quantization)

python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(
    "neuroturk/HYZ-01-0.6B-Base",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "neuroturk/HYZ-01-0.6B-Base",
    quantization_config=bnb_config,
    device_map="auto",
)

Fine-Tuning with Unsloth

python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="neuroturk/HYZ-01-0.6B-Base",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=32,
    lora_alpha=64,
    lora_dropout=0.0,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    use_gradient_checkpointing="unsloth",
)

GGUF Quantizations

For faster inference and lower resource usage, GGUF quantized versions of HYZ-01-0.6B-Base are available. These were kindly provided by mradermacher.

You can find them here: HYZ-01-0.6B-Base-GGUF

Using with llama.cpp

Download the GGUF file (e.g., hyz-01-0.6b-base-q4_k_m.gguf) from the repository above.

Run with llama.cpp:

bash
./main -m hyz-01-0.6b-base-q4_k_m.gguf -p "Your prompt here" -n 512

For a detailed explanation of quantization types (e.g., Q4_K_M, Q5_K_M), see the llama.cpp documentation.

Note: These GGUF files are not officially maintained by NeuroTürk, but they are community-tested and widely used. Thanks again to mradermacher for the contribution.

6. Limitations

This is a base model without instruction tuning — it will not follow instructions reliably.
Complex multi-step reasoning may be limited with 0.6B parameters.
Biases present in the training data may be reflected in outputs.
Performance drops significantly in languages other than Turkish.
Human verification of outputs is recommended for critical applications.

7. Citation

bibtex
@misc{neuroturk2026hyz01,
  author       = {NeuroTürk},
  title        = {HYZ-01-0.6B: A Lightweight Turkish Base Model},
  year         = 2026,

}

Model provider

neuroturk

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

1. Introduction

Note: This is the base pre-trained version. For the instruction-tuned version, see: HYZ-01-0.6B

2. Model Summary

Continual Pre-Training

Base model: 4-stage Turkish continual pre-training (CPT) applied on top of a multilingual foundation.
Training stages include general Turkish web corpus, curated domain data, Wikipedia, and high-quality filtered text.
Optimization: bfloat16, flash-attention-2, AdamW.

Tokenizer Extension

New special tokens were added to the tokenizer for two purposes:

Language-structure tokens: To represent Turkish morphological features more efficiently.
Task and structure tokens: To support structural use cases such as chain-of-thought, code blocks, section markers, and language labels.

The following 20 tokens have been added to the vocabulary and are reserved as infrastructure for future advanced capabilities:

Table with columns: Group, Tokens, Future Use
Group	Tokens	Future Use
Brand	`<\|neuroturk\|>` `<\|hyz01\|>` `<\|tr\|>` `<\|en\|>`	Model identity and multilingual control
Chain-of-Thought	`<\|think\|>` `<\|/think\|>` `<\|step\|>` `<\|answer\|>`	Step-by-step reasoning (CoT)
Dialogue

3. Model Details

Table with columns: Feature, Value
Feature	Value
Total parameters	595,798,016 (~0.6B)
Non-embedding parameters	440,467,456 (~0.44B)
Hidden dimension	1,024
Number of layers	28
Attention heads (Q)	16
Attention heads (KV)	8 (GQA)
Head dimension	128
Activation	SiLU
Normalization	RMSNorm (ε = 1 × 10⁻⁶)

4. Training Details

Table with columns: Setting, Value
Setting	Value
Training type	Continual Pre-Training (CPT)
Number of stages	4
Optimization	AdamW
Precision	BFloat16
LR schedule	Cosine with warmup
Context length	4,096 tokens

5. Usage

Warning: This is a base model. It is not instruction-tuned and will not follow instructions reliably. For conversational or task-oriented use, use the instruction-tuned version: HYZ-01-0.6B

Installation

bash
pip install transformers torch accelerate

Text Generation (Completion)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuroturk/HYZ-01-0.6B-Base"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    fix_mistral_regex=True 
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Yapay zeka, bilgisayar sistemlerinin"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.1,
)


new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

Low VRAM (4-bit Quantization)

python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(
    "neuroturk/HYZ-01-0.6B-Base",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "neuroturk/HYZ-01-0.6B-Base",
    quantization_config=bnb_config,
    device_map="auto",
)

Fine-Tuning with Unsloth

python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="neuroturk/HYZ-01-0.6B-Base",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=32,
    lora_alpha=64,
    lora_dropout=0.0,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    use_gradient_checkpointing="unsloth",
)

GGUF Quantizations

For faster inference and lower resource usage, GGUF quantized versions of HYZ-01-0.6B-Base are available. These were kindly provided by mradermacher.

You can find them here: HYZ-01-0.6B-Base-GGUF

Using with llama.cpp

Download the GGUF file (e.g., hyz-01-0.6b-base-q4_k_m.gguf) from the repository above.

Run with llama.cpp:

bash
./main -m hyz-01-0.6b-base-q4_k_m.gguf -p "Your prompt here" -n 512

For a detailed explanation of quantization types (e.g., Q4_K_M, Q5_K_M), see the llama.cpp documentation.

Note: These GGUF files are not officially maintained by NeuroTürk, but they are community-tested and widely used. Thanks again to mradermacher for the contribution.

6. Limitations

This is a base model without instruction tuning — it will not follow instructions reliably.
Complex multi-step reasoning may be limited with 0.6B parameters.
Biases present in the training data may be reflected in outputs.
Performance drops significantly in languages other than Turkish.
Human verification of outputs is recommended for critical applications.

7. Citation

bibtex
@misc{neuroturk2026hyz01,
  author       = {NeuroTürk},
  title        = {HYZ-01-0.6B: A Lightweight Turkish Base Model},
  year         = 2026,

}

HYZ-01-0.6B-Base

Get help setting up a custom Dedicated Endpoints.

README

1. Introduction

2. Model Summary

Continual Pre-Training

Tokenizer Extension

3. Model Details

4. Training Details

5. Usage

Installation

Text Generation (Completion)

Low VRAM (4-bit Quantization)

Fine-Tuning with Unsloth

GGUF Quantizations

6. Limitations

7. Citation

Explore FriendliAI today

README

1. Introduction

2. Model Summary

Continual Pre-Training

Tokenizer Extension

3. Model Details

4. Training Details

5. Usage

Installation

Text Generation (Completion)

Low VRAM (4-bit Quantization)

Fine-Tuning with Unsloth

GGUF Quantizations

6. Limitations

7. Citation