Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Which artifact do you want?
| you want | use |
|---|---|
| Just run the model | gr33r/ux-writing-1 (merged, vanilla transformers) |
| Run it on a laptop | gr33r/ux-writing-1-GGUF (Q4_K_M 16.6 GB) |
| This repo: attach to base / continue training | the 159 MB LoRA (r=16 α=32, LM projections) |
Usage (PEFT)
python
import torchfrom peft import PeftModelfrom transformers import AutoModelForImageTextToText, AutoTokenizerbase = "Qwen/Qwen3.6-27B"tok = AutoTokenizer.from_pretrained(base)model = AutoModelForImageTextToText.from_pretrained(base, dtype=torch.bfloat16, device_map="auto")model = PeftModel.from_pretrained(model, "gr33r/ux-writing-1-lora")# Prompt contract + enable_thinking=False guidance: see the merged model card.
Training: QLoRA (4-bit NF4, double-quant, bf16 compute), LoRA on q,k,v,o,gate,up,down
projections, 2 epochs on ≈1,400 owner-authored/derived rewrite pairs, one A100-80GB.
To fine-tune further on your style guide (≈$2–6 on HF Jobs), see
FINETUNE_GUIDE.md.
License: Apache-2.0. Attribution appreciated: ux-writing-1 by gr33r.
Model provider
gr33r
Model tree
Base
Qwen/Qwen3.6-27B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information