qwen3-0.6b-zh-lora API & Inference Endpoint

用法

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", dtype=torch.bfloat16, device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
model = PeftModel.from_pretrained(base, "woohello/qwen3-0.6b-zh-lora")

messages = [{"role": "user", "content": "请用中文简要介绍 LoRA 微调的核心思想。"}]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
ids = tok(text, return_tensors="pt").to("cuda")
with torch.no_grad():
    out = model.generate(**ids, max_new_tokens=300, do_sample=True, temperature=0.7, top_p=0.9)
print(tok.decode(out[0, ids["input_ids"].shape[1]:], skip_special_tokens=True))

训练参数

数据集: woohello/llm101-ai-history-sft-messages (3683 消息格式 QAs)
基础模型: Qwen/Qwen3-0.6B (596M 参数, BF16)
LoRA: r=16, alpha=32, dropout=0, target=q/k/v/o + gate/up/down
量化: 4-bit NF4 (bitsandbytes)
训练: bs=4, ga=2 (eff 8), lr=2e-4, cosine, 1 epoch (461 步)
训练耗时: 2.8 min @ RTX 3090

限制

基模型仅 0.6B，能力有限，回复可能不准确或含幻觉
LoRA 在 ch5b1 数据上训练 1 epoch，未做多轮 RLHF

用法

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", dtype=torch.bfloat16, device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
model = PeftModel.from_pretrained(base, "woohello/qwen3-0.6b-zh-lora")

messages = [{"role": "user", "content": "请用中文简要介绍 LoRA 微调的核心思想。"}]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
ids = tok(text, return_tensors="pt").to("cuda")
with torch.no_grad():
    out = model.generate(**ids, max_new_tokens=300, do_sample=True, temperature=0.7, top_p=0.9)
print(tok.decode(out[0, ids["input_ids"].shape[1]:], skip_special_tokens=True))

训练参数

数据集: woohello/llm101-ai-history-sft-messages (3683 消息格式 QAs)
基础模型: Qwen/Qwen3-0.6B (596M 参数, BF16)
LoRA: r=16, alpha=32, dropout=0, target=q/k/v/o + gate/up/down
量化: 4-bit NF4 (bitsandbytes)
训练: bs=4, ga=2 (eff 8), lr=2e-4, cosine, 1 epoch (461 步)
训练耗时: 2.8 min @ RTX 3090

限制

基模型仅 0.6B，能力有限，回复可能不准确或含幻觉
LoRA 在 ch5b1 数据上训练 1 epoch，未做多轮 RLHF

qwen3-0.6b-zh-lora

README

用法

训练参数

限制

Explore FriendliAI today

README

用法

训练参数

限制