qwen3-4b-nothink-s1-full-sft API & Inference Endpoint

Model description

Method: full SFT (all weights trainable), DeepSpeed ZeRO-2, 4 GPUs
Dataset: oci_nothink_50k (50,000 examples)
Template: qwen3_nothink
Not LoRA / not QLoRA: entire 4B model was updated

Note: The final training save was interrupted by disk full; published weights were restored from checkpoint-782 (same step count as training completion).

Training details

Table with columns: Field, Value
Field	Value
Epochs	1
Seed	42
cutoff_len	4096
packing	false
per_device_train_batch_size	4
gradient_accumulation_steps	4
effective_batch_size	64 (4 x 4 x 4 GPUs)
learning_rate	3e-5
train_loss	0.1572
train_steps	782
finished_at	2026-06-10 06:18 CST
runtime	~53 min

Think SFT (same project): modrill/qwen3-4b-think-s1-ep23-full-sft - ocr_think_50k, qwen3 template
Base: Qwen/Qwen3-4B-Base

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "modrill/qwen3-4b-nothink-s1-full-sft"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

License

Released under Apache 2.0 (see upstream Qwen model card if not bundled here).

Model description

Method: full SFT (all weights trainable), DeepSpeed ZeRO-2, 4 GPUs
Dataset: oci_nothink_50k (50,000 examples)
Template: qwen3_nothink
Not LoRA / not QLoRA: entire 4B model was updated

Note: The final training save was interrupted by disk full; published weights were restored from checkpoint-782 (same step count as training completion).

Training details

Table with columns: Field, Value
Field	Value
Epochs	1
Seed	42
cutoff_len	4096
packing	false
per_device_train_batch_size	4
gradient_accumulation_steps	4
effective_batch_size	64 (4 x 4 x 4 GPUs)
learning_rate	3e-5
train_loss	0.1572
train_steps	782
finished_at	2026-06-10 06:18 CST
runtime	~53 min

Think SFT (same project): modrill/qwen3-4b-think-s1-ep23-full-sft - ocr_think_50k, qwen3 template
Base: Qwen/Qwen3-4B-Base

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "modrill/qwen3-4b-nothink-s1-full-sft"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

License

Released under Apache 2.0 (see upstream Qwen model card if not bundled here).

qwen3-4b-nothink-s1-full-sft

README

Model description

Training details

Usage

License

Explore FriendliAI today

README

Model description

Training details

Usage

License

qwen3-4b-nothink-s1-full-sft

README

Model description

Training details

Related models

Usage

License

Explore FriendliAI today

README

Model description

Training details

Related models

Usage

License