hotdogs
qwen3.6-27b-mythos5k-lora
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3.6-27B |
| Quantization | 4-bit NF4 |
| Precision | BF16 |
| LoRA Rank (r) | 8 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Optimizer | paged_adamw_8bit |
| Learning Rate | 2e-4 (cosine schedule) |
| Batch Size | 1 (grad_accum=2 → effective 2) |
| Max Length | 1024 |
| Epochs | 3 |
| Training Steps | 7,500 |
| Training Time | 13h 01m |
| Final Loss | 0.040 |
| Final Accuracy | 98.5% |
| Hardware | RTX 4090 (via vast.ai) |
Dataset
Source: WithinUsAI/cluade_mythos_preview_5k_v2
Messages format with system/user/assistant roles. Applied via tokenizer.apply_chat_template().
GGUF (Weight-Diff)
A pre-converted LoRA GGUF is available for quick merging with llama.cpp:
bash
# Merge with base model (CPU merge, requires ~503GB RAM for F16)llama-export-lora --no-mmap \--model Qwen3.6-27B-Q8_0.gguf \--lora-scaled GGUF/qwen36-mythos5k-lora.gguf:1.0 \--output mythos5k-f16.gguf# Quantize to smaller formatllama-quantize mythos5k-f16.gguf mythos5k-q4_k_m.gguf Q4_K_M
Inference with Transformers
python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-27B",torch_dtype=torch.float16,device_map="auto",trust_remote_code=True)model = PeftModel.from_pretrained(base,"hotdogs/qwen3.6-27b-mythos5k-lora")tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-27B", trust_remote_code=True)messages = [{"role": "system", "content": "You are a master storyteller of myths and legends."},{"role": "user", "content": "Tell me about the creation of the world in a forgotten pantheon."}]prompt = tokenizer.apply_chat_template(messages, tokenize=False)inputs = tokenizer(prompt, return_tensors="pt").to("cuda")output = model.generate(**inputs, max_new_tokens=256)print(tokenizer.decode(output[0]))
License
Apache 2.0
Created by UKA — 18yo coder & cybersecurity expert. June 18, 2026.
💖 Support / สนับสนุน
If you find this model useful, please consider supporting my work!
หากคุณคิดว่าโมเดลนี้มีประโยชน์ กรุณาสนับสนุนผลงานของฉันด้วยนะคะ! 🙏
₿ Bitcoin — BTC:
markdown
bc1qf27cyk3vmugcdyv9xdtuv5jwz37863crpj5c9v
Thank you for your support! 🙏✨
ขอบคุณมากๆ สำหรับการสนับสนุนค่า! 💖🤗
Model provider
hotdogs
Model tree
Base
Qwen/Qwen3.6-27B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information
