modrill

qwen3-4b-nothink-baseline-full-sft

README

License: apache-2.0

Training summary

Table with columns: Field, Value
Field	Value
Method	Full SFT (ZeRO-2)
Dataset	nothink_mix
Chat template	qwen3_nothink
Epochs	2
Seed	42
Cutoff length	8192
Packing	false
Per-device batch size	4
Gradient accumulation	4
Effective batch size	64
Learning rate	2e-5
Train loss	0.1898
Train steps	2988
Finished	2026-06-08 07:10 CST

Usage

Load with Hugging Face Transformers:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "modrill/qwen3-4b-nothink-baseline-full-sft"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

Training hyperparameters

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU (4 devices)
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: AdamW (fused)
lr_scheduler: cosine
lr_scheduler_warmup_steps: 0.03
num_epochs: 2.0

Framework versions

Transformers 5.6.0
PyTorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

modrill

Model Tree

Base

Qwen/Qwen3-4B-Base

Fine-tuned

this model

Input Modalities

Text

Output Modalities