Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

🔮 Roadmap / 预告

This is one of the last one or two QLoRA fine-tunes built on Qwen3.6-27B. What comes next depends on the base-model timeline:

  • Primary plan — wait for Qwen3.7-27B. Once Qwen3.7-27B is available, new releases will move to that base. This kicks off a v3 line: more rigorous tuning aimed at near-flawless, stable output (the realistic-crime adapter yuxinlu1/qwen3-6-27b-chinese-crime-fiction-lora-v2 is the reference for that next step), together with upgrading several existing adapters to Qwen3.7.
  • Interim fallback. If Qwen3.7-27B does not arrive within the next week or so, a previously finished fine-tune from the backlog will be released in the meantime — note that this one targets an uncensored base model.

Alongside the model work, the companion writing pipeline (github.com/DuckTraDo/Novel) is getting a major overhaul focused on ease of use. It is expected to ship as a packaged application, opening up features such as automatic continuation / auto-drafting.


A LoRA adapter that guides Qwen3.6-27B toward Chinese xianxia / cultivation prose (玄幻修仙) — a long-form fantasy-cultivation tradition built around immortal-seeking, clan and sect politics, and the slow climb through cultivation realms.

This adapter is designed for a specific writing problem: Given a short instruction or a neutral scene description, the model should directly produce Chinese cultivation-fiction prose grounded in the xianxia register, rather than analysis, outline, summary, or generic fantasy pastiche.

The core direction is intentionally narrow:

  • third-person narration that follows characters' inner judgment as they scheme, cultivate, and size each other up
  • cultivation-world settings: secluded caves and closed-door cultivation (洞府/闭关), sects and great clans (宗门/世家), mountain gates, spirit veins, market towns (坊市)
  • xianxia imagery grounded in a cultivation cosmology (灵气, 境界, 法器, 丹药, 神识, 御风/驾风, 玉简, 剑意) rather than Western fantasy tropes (no wizards, elves, or Western dragons)
  • cultivation jargon woven into the narration itself, not just the dialogue — realm names, technique names, and artifact lore read as native to the voice
  • direct fiction prose, not analysis, outlines, or revision advice
  • cultivation and power progression treated as a vehicle for depicting clan politics, ambition, and human nature, not as a stat sheet

The goal is for outputs to behave like usable Chinese cultivation-fiction scene drafts, and less like a generic writing assistant explaining what xianxia is.

Typical use cases:

  • drafting cultivation/xianxia scenes inside a longer novel project
  • rewriting modern-toned paragraphs into cultivation-register prose
  • sect / clan / closed-door-breakthrough fiction in a long-form writing pipeline
  • local and private creative-writing workflows
  • integration with a long-form novel pipeline that manages outline, memory, timeline, and continuity

This is an adapter only. It does not include base model weights, training data, or copyrighted source material.

🧭 About the v1 / v2 Series

This repository is part of a small series of Chinese fiction LoRAs (现代悬疑 realistic crime, 民俗志怪 folk-horror, and now 玄幻修仙 xianxia).

  • v1 — Style Retraining. The v1 series focuses mainly on prose style. It uses SFT to move the base model away from generic AI prose toward a specific Chinese literary voice.
  • v2 — Expanded Training & Full Release. The v2 line uses a larger, cleaner training set, longer SFT, and ships a complete release (PEFT safetensors + GGUF). Behavioral DPO refinement is applied where available.

Note on this release: this xianxia adapter is the SFT stage (QLoRA, 2 epochs on a chapter-level cultivation corpus). It is labeled v2 for its expanded training and full safetensors release. DPO behavioral refinement has not yet been applied and is planned as a follow-up iteration.

Available formats: HF PEFT safetensors (and a GGUF LoRA once converted). MLX users may also be able to use the PEFT safetensors through mlx-lm depending on their local setup.

🌱 Status

FieldValue
Versionv2 (SFT stage)
FocusChinese xianxia / cultivation prose behavior
FormatHF PEFT safetensors (GGUF LoRA planned)
Base modelQwen3.6-27B
LanguageChinese
Use casecultivation scene drafting, sect/clan fiction, realm-progression prose
Training styleSFT (QLoRA, r=16 / α=32, 2 epochs, max_seq=3300); DPO planned
Recommended workflowlocal long-form writing pipeline

🔗 Companion Novel Pipeline

This LoRA is designed to work together with a local-first, long-form Chinese novel writing pipeline: github.com/DuckTraDo/Novel

The pipeline handles the structural side of long-form fiction:

  • outline and chapter planning
  • scene-level context assembly
  • story memory and character tracking
  • timeline and continuity checks

This LoRA handles the prose-behavior side:

  • third-person cultivation narration with character interiority
  • xianxia scene drafting (closed-door breakthrough, sect confrontation, artifact lore)
  • cultivation-register diction woven into narration
  • anti-modern-tone and anti-Western-fantasy-trope

They are intentionally split: the pipeline owns what happens, the LoRA owns how it reads on the page. You can use either independently, but they are designed as a pair.

📦 Files

Files included: adapter_config.json, adapter_model.safetensors, tokenizer.json, tokenizer_config.json, chat_template.jinja, README.md.

A GGUF LoRA (*-f16.gguf) for llama.cpp can be generated from the adapter and will be added to this repo; see the llama.cpp section below.

⚠️ Critical Usage Note: enable_thinking=False

Qwen3.6 ships with a thinking mode enabled by default. When you call apply_chat_template(messages, add_generation_prompt=True) with default settings, the chat template leaves an unclosed <think>\n block before the assistant's turn. With this LoRA loaded, that causes outputs to begin with an English thinking-process preamble (e.g. "Here's a thinking process: 1. Analyze the user input...") instead of Chinese cultivation prose.

The training samples used a closed empty thinking block: <think>\n\n</think>\n\n followed by the actual prose.

Always pass enable_thinking=False:

python

text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)

For llama.cpp / GGUF inference, pass the included chat_template.jinja via --chat-template-file, or manually prepend the assistant prefix <think>\n\n</think>\n\n in the prompt.

This is the single most common usage error. If outputs look like English reasoning instead of Chinese prose, this is almost certainly the cause.

🚀 Example PEFT / Transformers Usage

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model, adapter = "Qwen/Qwen3.6-27B", "yuxinlu1/qwen3-6-27b-chinese-xianxia-lora-v2"
tokenizer = AutoTokenizer.from_pretrained(adapter, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
base_model, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, adapter).eval()
messages = [{"role": "user", "content": "写一段玄幻修仙小说。"}]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=640,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.15,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

🚀 Example llama.cpp Usage

Pair this GGUF LoRA with a compatible Qwen3.6-27B GGUF base. Unsloth's Qwen3.6-27B-GGUF (Q4_K_M or Q8_0) is recommended.

bash

llama-cli -m qwen3.6-27b-base.Q4_K_M.gguf \
--lora qwen3-6-27b-chinese-xianxia-lora-v2-f16.gguf \
--chat-template-file chat_template.jinja \
-p "写一段玄幻修仙小说。" \
-n 640 --temp 0.8 --top-p 0.9 --repeat-penalty 1.15

🧪 Example Prompts

Prompts can be short:

markdown

写一段玄幻修仙小说。
讲一个修仙者的故事。
深山一处石洞里,洞壁上的灵光忽明忽暗,映着满地碎石。
小说:少年盘膝而坐,体内那缕灵气又凝实了几分。
把下面这段改成玄幻修仙小说正文:
他打开电脑准备加班,忽然觉得胸口一热,等回过神来,发现自己躺在一间陌生的土屋里。

Suggested decoding range (tuned for this adapter):

SettingRange
temperature0.7 – 0.85
top_p0.85 – 0.92
repetition_penalty1.1 – 1.18
max_new_tokens400 – 700

repetition_penalty matters here: too low (≈1.05) lets long generations fall into verbatim loops; too high (≈1.2+) can push Chinese character names toward homophone drift. ~1.15 is a good default. The adapter is strongest in shorter bursts — very long single generations (>~700 tokens) may drift; generate scene-by-scene for best results.

🧪 Internal Evaluation Snapshot

A small internal evaluation was run across four prompt tiers (original-text continuation, neutral scene description, character-grounded task, and minimal generic instruction) to verify style transfer.

Qualitative observations:

  • Original-text continuation: natural continuation in the trained cultivation voice; realm/technique/artifact vocabulary and the xianxia register carry through.
  • Neutral scene description: cultivation atmosphere and plot emerge from neutral prompts without explicit xianxia cues in the input.
  • Character-grounded task: characters behave consistently with cultivation-fiction conventions (sect etiquette, closed-door breakthrough, clan politics).
  • Minimal generic instruction (e.g. "write a piece of xianxia fiction"): the model produces cultivation prose directly without first explaining what xianxia is — confirming the style was internalized, not merely instruction-followed.

These observations come from a small local development eval and should be treated as an internal signal, not a public benchmark.

🧪 Intended Use

Intended for:

  • local Chinese xianxia / cultivation drafting
  • rewriting modern-toned paragraphs into cultivation-register prose
  • testing third-person cultivation narration
  • sect / clan / breakthrough fiction in a human-in-the-loop workflow
  • offline and privacy-respecting novel drafting
  • integration with a long-form novel pipeline

Not intended for: author impersonation, defamation, harassment, factual claims, high-stakes advice, spam, or deception.

⚠️ Limitations

  • It does not guarantee full-novel plot coherence by itself. Character continuity, foreshadowing, and timeline logic should be handled by an external writing pipeline or by the author.
  • Long single generations may degrade. Past roughly 600–700 tokens, very long outputs can drift, loop, or destabilize character names. Generate scene-by-scene and keep repetition_penalty near 1.15.
  • Character-name stability depends on decoding settings; aggressive repetition penalties can turn a name into a homophone variant.
  • Optimized for cultivation-world settings; outputs for modern, urban, or non-cultivation scenes may revert toward base-model tone.
  • It may produce shorter-than-expected outputs if the prompt is very minimal or decoding settings are conservative.
  • If enable_thinking=False is not set, outputs will begin with an English thinking-process preamble. This is the single most common usage error.
  • Output quality depends on the base model, quantization, sampling settings, prompt design, and context quality.

🛡️ Safety and Legal Notes

  • This repository contains adapter weights and supporting tokenizer/template files, not base model weights.
  • No copyrighted novels, private manuscripts, or proprietary datasets are distributed in this repository.
  • This LoRA is not designed to imitate any specific living author; the goal is to capture a broader Chinese xianxia / cultivation prose tradition.
  • Outputs are machine-generated fiction; do not use this model for harassment, defamation, fraud, or deceptive impersonation.
  • Users are responsible for the base model license, adapter license, and applicable law.

📜 License

LoRA adapter: MIT. Base model: governed by its own license.

Model provider

yuxinlu1

Model tree

Base

Qwen/Qwen3.6-27B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today